This document discusses big data analytics and analytic platforms. It provides an overview of why organizations are adopting big data analytics due to changing data types and advances in technology. It also discusses different analytic techniques like MPP and columnar storage used in analytic platforms. The document proposes a framework for organizations to succeed with big data analytics through their culture, people, organization structure, architecture, and use of analytic platforms. It also discusses different types of analytic platforms and considerations for purchasing them.
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
BDIA Roundtable
Live Webcast on April 9, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=c84869fcca958d278b210cfca2a023a0
Big Data can offer big value and big challenges, and there are lots of solutions and promises out there. But in order to harness the most insight from Big Data, organizations need to solve pain points with more than triage. Since data challenges continue to permeate the information landscape, businesses would do well to incorporate solutions that fit into the infrastructure and provide a sustainable method for managing and analyzing Big Data.
Register for this Roundtable Webcast to hear veteran Analysts Robin Bloor, Mike Ferguson and Richard Winter as they offer their perspectives on the evolving Big Data industry. They’ll comment on the proposed Big Data Information Architecture, and take questions from the audience. This is the second event of The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report.
Visit InsideAnlaysis.com for more information.
TDWI Boston Keynote - The New BI/Analytics Synergy - 7 30-2015 - tdwi keynoteEckerson Group
To stay relevant in a fast-changing business and data environment, business and analytics leaders need to recognize that their teams are no longer the center of the data universe. They need to reach out and partner with other data analytics players in the organization and create a shared vision for the future. The new business analytics leader fosters a rich analytical ecosystem of people, processes and technologies that fuels a data-driven organization.
You Will Learn:
- How the data world has changed and why
- The cyclical nature of power in the data world
- Characteristics of the new analytical ecosystem
- The role of BI leaders and teams in the new world order
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
2020 Big Data & Analytics Maturity Survey ResultsAtScale
Together with Cloudera and ODPI.org, AtScale surveyed over 150 data & analytics leaders. This presentation reveals the results of the survey. To download the report, go to: https://tinyurl.com/qmwofof
This presentation will discuss the stories of 3 companies that span different industries; what challenges they faced and how cloud analytics solved for them; what technologies were implemented to solve the challenges; and how they were able to benefit from their new cloud analytics environments.
The objectives of this session include:
• Detail and explain the key benefits and advantages of moving BI and analytics workloads to the cloud, and why companies shouldn’t wait any longer to make their move.
• Compare the different analytics cloud options companies have, and the pros and cons of each.
• Describe some of the challenges companies may face when moving their analytics to the cloud, and what they need to prepare for.
• Provide the case studies of three companies, what issues they were solving for, what technologies they implemented and why, and how they benefited from their new solutions.
• Learn what to look for one considering a partner and trusted advisor to assist with an analytics cloud migration.
Active Governance Across the Delta Lake with AlationDatabricks
Alation provides a single interface to provide users and stewards to provide active and agile data governance across Databricks Delta Lake and Databricks SQL Analytics Service. Understand how Alation can expand adoption in the data lake while providing safe and responsible data consumption.
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
BDIA Roundtable
Live Webcast on April 9, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=c84869fcca958d278b210cfca2a023a0
Big Data can offer big value and big challenges, and there are lots of solutions and promises out there. But in order to harness the most insight from Big Data, organizations need to solve pain points with more than triage. Since data challenges continue to permeate the information landscape, businesses would do well to incorporate solutions that fit into the infrastructure and provide a sustainable method for managing and analyzing Big Data.
Register for this Roundtable Webcast to hear veteran Analysts Robin Bloor, Mike Ferguson and Richard Winter as they offer their perspectives on the evolving Big Data industry. They’ll comment on the proposed Big Data Information Architecture, and take questions from the audience. This is the second event of The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report.
Visit InsideAnlaysis.com for more information.
TDWI Boston Keynote - The New BI/Analytics Synergy - 7 30-2015 - tdwi keynoteEckerson Group
To stay relevant in a fast-changing business and data environment, business and analytics leaders need to recognize that their teams are no longer the center of the data universe. They need to reach out and partner with other data analytics players in the organization and create a shared vision for the future. The new business analytics leader fosters a rich analytical ecosystem of people, processes and technologies that fuels a data-driven organization.
You Will Learn:
- How the data world has changed and why
- The cyclical nature of power in the data world
- Characteristics of the new analytical ecosystem
- The role of BI leaders and teams in the new world order
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
2020 Big Data & Analytics Maturity Survey ResultsAtScale
Together with Cloudera and ODPI.org, AtScale surveyed over 150 data & analytics leaders. This presentation reveals the results of the survey. To download the report, go to: https://tinyurl.com/qmwofof
This presentation will discuss the stories of 3 companies that span different industries; what challenges they faced and how cloud analytics solved for them; what technologies were implemented to solve the challenges; and how they were able to benefit from their new cloud analytics environments.
The objectives of this session include:
• Detail and explain the key benefits and advantages of moving BI and analytics workloads to the cloud, and why companies shouldn’t wait any longer to make their move.
• Compare the different analytics cloud options companies have, and the pros and cons of each.
• Describe some of the challenges companies may face when moving their analytics to the cloud, and what they need to prepare for.
• Provide the case studies of three companies, what issues they were solving for, what technologies they implemented and why, and how they benefited from their new solutions.
• Learn what to look for one considering a partner and trusted advisor to assist with an analytics cloud migration.
Active Governance Across the Delta Lake with AlationDatabricks
Alation provides a single interface to provide users and stewards to provide active and agile data governance across Databricks Delta Lake and Databricks SQL Analytics Service. Understand how Alation can expand adoption in the data lake while providing safe and responsible data consumption.
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
Say goodbye to data silos! Analytics in a Day will simplify and accelerate your journey towards the modern data warehouse. Join CCG and Microsoft for a half-day virtual workshop, hosted by James McAuliffe.
Analyst Webinar: Best Practices In Enabling Data-Driven Decision MakingDenodo
Watch full webinar here: https://bit.ly/37YkgN4
This presentation looks at the trends that are emerging from companies on their journeys to becoming data-driven enterprises.
These trends are taken from a survey of 500 companies and highlight critical success factors, what companies are doing, their progress so far and their plans going forward. It also looks at the role that data virtualization has within the data driven enterprise.
During the session we'll address:
- What is a data-driven enterprise?
- What are the critical success factors?
- What are companies doing to create a data-driven enterprise and why?
- What progress are they making?
- What are the plans on people, process and technologies?
- Why is data virtualization central to provisioning and accessing data in a data-driven enterprise?
- How should you get started?
Predictive Analytics - Big Data Warehousing MeetupCaserta
Predictive analytics has always been about the future, and the age of big data has made that future an increasingly dynamic place, filled with opportunity and risk.
The evolution of advanced analytics technologies and the continual development of new analytical methodologies can help to optimize financial results, enable systems and services based on machine learning, obviate or mitigate fraud and reduce cybersecurity risks, among many other things.
Caserta Concepts, Zementis, and guest speaker from FICO presented the strategies, technologies and use cases driving predictive analytics in a big data environment.
For more information, visit www.casertaconcepts.com or contact us at info@casertaconcepts.com
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:
> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?
It covers the basic of analytics, types of analytics, tools, and techniques of analytics, and a briefcase study to demonstrate the predictive analytics with decision tree algorithm of machine learning
Agile Data Management with Enterprise Data Fabric (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3juxqaw
In a world where machine learning and artificial intelligence are changing our everyday lives, digital transformation tops the strategic agenda in many private and government organizations. Data is becoming the lifeblood of a company, flowing seamlessly through it to enable deep business insights, create new opportunities, and optimize operations.
Chief Data Officers and Data Architects are under continuous pressure to find the best ways to manage the overwhelming volumes of the data that tend to become more and more distributed and diverse.
Moving data physically to a single location for reporting and analytics is not an option anymore – this is the fact accepted by the majority of the data professionals.
Join us for this webinar to know about the modern virtual data landscapes including:
- Virtual Data Fabric
- Data Mesh
- Multi-Cloud Hybrid architecture
- and to learn how to leverage the Denodo Data Virtualization platform to implement these modern data architectures.
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
Informatica is now part of the Business Data Lake ecosystem developed by Capgemini and Pivotal. Customers worldwide will now be able to leverage Informatica’s data integration software in addition to Pivotal’s advanced big data, analytics and application software, and Capgemini’s industry and implementation expertise. Informatica will deliver certified technologies for Data Integration, Data Quality and Master Data Management (MDM) to help enterprises distill raw data into actionable insights.
http://www.capgemini.com/resources/the-business-data-lake-delivering-the-speed-and-accuracy-to-solve-your-big-data-problems
Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Databricks
Ceska sporitelna is one of the largest banks in Central Europe and one it’s main goals is to improve the customer experience by weaving together the digital and traditional banking approach. The talk will focus on the real world (both technical and enterprise) challenges during shifting the vision from powerpoint slides into production: Implementing Spark and Databricks-centric analytics platform in the Azure cloud combined with a on-prem data lake in the EU-regulated financial environment Forming a new team focused on solving use cases on top of C360 in the 10 000+ employee enterprise Demonstrating this effort on real use cases such as client risk scoring using both offline and online data Spark and its MLlib as an enabler for employing hundreds of millions of client interactions personalized omni-channel CRM campaigns
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Self Service Analytics enabled by Data Virtualization from DenodoDenodo
Watch full webinar here: https://bit.ly/39U9qY8
Self-service Analytics BI is often quoted by many - ie, allow users to discover and access data without having to ask IT to create a data mart, or by allowing users to directly export/copy the data from the data sources themselves into their analytics tools and systems. The challenge is not just to provide access to the data – even from Excel this can be done - but to do this in real time without creating processing overhead, while getting trusted data, with the best response time possible, in a managed, governed and secure way in order for these users to trust the output of the analysis.
Data Virtualization provides a data access platform that allows users to access the data they need from multiple data sources, when they need it, and with the best possible response time. In addition, a Data Marketplace built on top of this proven technology enables Self Service Analytics by exposing consistent and governed data sets to be discovered by users, providing the trusted foundation for a successful Self-Service Analytics initiative.
These slides - based on the webinar - shed light on how business stakeholders make the most of information from their big data environments and the requirements those stakeholders have to turn big data into business impact.
Using recent big data end-user research from leading IT analyst firm Enterprise Management (EMA), data from Vertica’s recent benchmarks on SQL on Hadoop, and firsthand customer experiences, viewers will learn:
- Use cases where end users around the world are using big data in their organizations
- How maturity with big data strategies impact why and how business stakeholders use information from their big data environments
- How Vertica empowers the use of information from big data environments
Embedding Insight through Prediction Driven LogisticsDatabricks
Aggreko are a leading provider of temporary power and temperature control solutions, serving customers across the globe as they work on projects ranging from the Olympics to aiding humanitarian disaster relief. In this talk, Helena and Andy will discuss how the Insights team have developed scalable machine learning solutions to support the business. In particular they will discuss fuel consumption forecasts that have helped Aggreko’s fuel logistics teams improve customer service levels and reduce costs by becoming more proactive and insight driven.
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
Say goodbye to data silos! Analytics in a Day will simplify and accelerate your journey towards the modern data warehouse. Join CCG and Microsoft for a half-day virtual workshop, hosted by James McAuliffe.
Analyst Webinar: Best Practices In Enabling Data-Driven Decision MakingDenodo
Watch full webinar here: https://bit.ly/37YkgN4
This presentation looks at the trends that are emerging from companies on their journeys to becoming data-driven enterprises.
These trends are taken from a survey of 500 companies and highlight critical success factors, what companies are doing, their progress so far and their plans going forward. It also looks at the role that data virtualization has within the data driven enterprise.
During the session we'll address:
- What is a data-driven enterprise?
- What are the critical success factors?
- What are companies doing to create a data-driven enterprise and why?
- What progress are they making?
- What are the plans on people, process and technologies?
- Why is data virtualization central to provisioning and accessing data in a data-driven enterprise?
- How should you get started?
Predictive Analytics - Big Data Warehousing MeetupCaserta
Predictive analytics has always been about the future, and the age of big data has made that future an increasingly dynamic place, filled with opportunity and risk.
The evolution of advanced analytics technologies and the continual development of new analytical methodologies can help to optimize financial results, enable systems and services based on machine learning, obviate or mitigate fraud and reduce cybersecurity risks, among many other things.
Caserta Concepts, Zementis, and guest speaker from FICO presented the strategies, technologies and use cases driving predictive analytics in a big data environment.
For more information, visit www.casertaconcepts.com or contact us at info@casertaconcepts.com
Rethink Analytics with an Enterprise Data HubCloudera, Inc.
Have you run into one or more of the following barriers or limitations with your existing data warehousing architecture:
> Increasingly high data storage and/or processing costs?
> Silos of data sources?
> Complexity of management and security?
> Lack of analytics agility?
It covers the basic of analytics, types of analytics, tools, and techniques of analytics, and a briefcase study to demonstrate the predictive analytics with decision tree algorithm of machine learning
Agile Data Management with Enterprise Data Fabric (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3juxqaw
In a world where machine learning and artificial intelligence are changing our everyday lives, digital transformation tops the strategic agenda in many private and government organizations. Data is becoming the lifeblood of a company, flowing seamlessly through it to enable deep business insights, create new opportunities, and optimize operations.
Chief Data Officers and Data Architects are under continuous pressure to find the best ways to manage the overwhelming volumes of the data that tend to become more and more distributed and diverse.
Moving data physically to a single location for reporting and analytics is not an option anymore – this is the fact accepted by the majority of the data professionals.
Join us for this webinar to know about the modern virtual data landscapes including:
- Virtual Data Fabric
- Data Mesh
- Multi-Cloud Hybrid architecture
- and to learn how to leverage the Denodo Data Virtualization platform to implement these modern data architectures.
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
Informatica is now part of the Business Data Lake ecosystem developed by Capgemini and Pivotal. Customers worldwide will now be able to leverage Informatica’s data integration software in addition to Pivotal’s advanced big data, analytics and application software, and Capgemini’s industry and implementation expertise. Informatica will deliver certified technologies for Data Integration, Data Quality and Master Data Management (MDM) to help enterprises distill raw data into actionable insights.
http://www.capgemini.com/resources/the-business-data-lake-delivering-the-speed-and-accuracy-to-solve-your-big-data-problems
Watch this webinar in full here: https://buff.ly/2MVTKqL
Self-Service BI promises to remove the bottleneck that exists between IT and business users. The truth is, if data is handed over to a wide range of data consumers without proper guardrails in place, it can result in data anarchy.
Attend this session to learn why data virtualization:
• Is a must for implementing the right self-service BI
• Makes self-service BI useful for every business user
• Accelerates any self-service BI initiative
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Databricks
Ceska sporitelna is one of the largest banks in Central Europe and one it’s main goals is to improve the customer experience by weaving together the digital and traditional banking approach. The talk will focus on the real world (both technical and enterprise) challenges during shifting the vision from powerpoint slides into production: Implementing Spark and Databricks-centric analytics platform in the Azure cloud combined with a on-prem data lake in the EU-regulated financial environment Forming a new team focused on solving use cases on top of C360 in the 10 000+ employee enterprise Demonstrating this effort on real use cases such as client risk scoring using both offline and online data Spark and its MLlib as an enabler for employing hundreds of millions of client interactions personalized omni-channel CRM campaigns
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Self Service Analytics enabled by Data Virtualization from DenodoDenodo
Watch full webinar here: https://bit.ly/39U9qY8
Self-service Analytics BI is often quoted by many - ie, allow users to discover and access data without having to ask IT to create a data mart, or by allowing users to directly export/copy the data from the data sources themselves into their analytics tools and systems. The challenge is not just to provide access to the data – even from Excel this can be done - but to do this in real time without creating processing overhead, while getting trusted data, with the best response time possible, in a managed, governed and secure way in order for these users to trust the output of the analysis.
Data Virtualization provides a data access platform that allows users to access the data they need from multiple data sources, when they need it, and with the best possible response time. In addition, a Data Marketplace built on top of this proven technology enables Self Service Analytics by exposing consistent and governed data sets to be discovered by users, providing the trusted foundation for a successful Self-Service Analytics initiative.
These slides - based on the webinar - shed light on how business stakeholders make the most of information from their big data environments and the requirements those stakeholders have to turn big data into business impact.
Using recent big data end-user research from leading IT analyst firm Enterprise Management (EMA), data from Vertica’s recent benchmarks on SQL on Hadoop, and firsthand customer experiences, viewers will learn:
- Use cases where end users around the world are using big data in their organizations
- How maturity with big data strategies impact why and how business stakeholders use information from their big data environments
- How Vertica empowers the use of information from big data environments
Embedding Insight through Prediction Driven LogisticsDatabricks
Aggreko are a leading provider of temporary power and temperature control solutions, serving customers across the globe as they work on projects ranging from the Olympics to aiding humanitarian disaster relief. In this talk, Helena and Andy will discuss how the Insights team have developed scalable machine learning solutions to support the business. In particular they will discuss fuel consumption forecasts that have helped Aggreko’s fuel logistics teams improve customer service levels and reduce costs by becoming more proactive and insight driven.
Example of the BI application technology comparison based on customer needs and application capabilities performed by DWApplications.
This is one of 3 deliverables in the free BI Roadmap Assessment provided by DWApplications.
- BI application technology comparison
- Current and future state assessment
- Timeline, resource and implementation plan
If you are interested in a free BI roadmap assessment
Contact: scott.mitchell@dwapplications.com
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
Watch full webinar here: https://bit.ly/3zVJRRf
According to Dresner Advisory’s 2020 Self-Service Business Intelligence Market Study, 62% of the responding organizations say self-service BI is critical for their business. If we look deeper into the need for today’s self-service BI, it’s beyond some Executives and Business Users being enabled by IT for self-service dashboarding or report generation. Predictive analytics, self-service data preparation, collaborative data exploration are all different facets of new generation self-service BI. While democratization of data for self-service BI holds many benefits, strict data governance becomes increasingly important alongside.
In this session we will discuss:
- The latest trends and scopes of self-service BI
- The role of logical data fabric in self-service BI
- How Denodo enables self-service BI for a wide range of users - Customer case study on self-service BI
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
Many enterprises are turning to Apache Hadoop to enable Big Data Analytics and reduce the costs of traditional data warehousing. Yet, it is hard to succeed when 80% of the time is spent on moving data and only 20% on using it. It’s time to swap the 80/20! The Big Data experts at Attunity and Hortonworks have a solution for accelerating data movement into and out of Hadoop that enables faster time-to-value for Big Data projects and a more complete and trusted view of your business. Join us to learn how this solution can work for you.
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Denodo
This presentation has been extracted from a full webinar organized by Denodo. To learn more click here: http://bit.ly/1FOMD90
Big Data, Internet of Things, Data Lakes, Streaming Analytics, Machine Learning… these are just a few of the buzzwords being thrown around in the world of data management today. They provide us with new sources of data, new forms of analytics, and new ways of storing, managing and utilizing our data. The reality however, is that traditional Data Warehouse architectures are no longer able to handle many of these new technologies and a new data architecture is required.
So what does the new architecture look like? Does the enterprise data warehouse still have a role? Where do these new technologies fit in? How can business users easily and quickly access the various sources of data and analytic results at the right time to make the right decisions in this new world order?
Dr. Claudia Imhoff addresses these questions and presents the Extended Data Warehouse architecture (XDW), demonstrating the need for each component and how an enterprise combines these into appropriate workflows for proper decision support.
Learn about the organizational and architectural strategies needed to make self-service analytics successful. Self-service is more about process and training instead of only focusing on tools.
Download this research to read about self-service architecture in detail:https://www.eckerson.com/articles/a-reference-architecture-for-self-service-analytics
If you need help with self-service analytics, data architecture or data management, contact us on the following link: https://www.eckerson.com/consulting
Managing Data Sprawl with Data Catalogs for Self-ServiceEckerson Group
Self-service analytics tools solved one problem but created another. Now we need data catalogs and self-service data.
When you're done reading the slides, you can download our research: The Ultimate Guide to Data Catalogs https://www.eckerson.com/articles/the-ultimate-guide-to-data-catalogs-key-things-to-consider-when-selecting-a-data-catalog
We also help you choose a data catalog and teach you how to implement it across your enterprise. Contact us-https://www.eckerson.com/consulting
Effective tips for leaders in the #BI and #Analytics space who want to become better leaders.
Visit www.eckerson.com to know more about our research, education and consulting services.
In this webcast, Wayne Eckerson discusses the impact of artificial intelligence (AI) and machine (ML) learning on the finance industry, specifically how AI/ML will lead to augmented intelligence, in which man and machine collaborate to deliver optimal outcomes and decisions.
Watch the webinar on this link: https://www.youtube.com/watch?time_continue=506&v=UwBEd_m0XBs
TDWI Boston Keynote: The New BI/Analytics Synergy Eckerson Group
To stay relevant in a fast-changing business and data environment, business and analytics leaders need to recognize that their teams are no longer the center of the data universe. They need to reach out and partner with other data analytics players in the organization and create a shared vision for the future. The new business analytics leader fosters a rich analytical ecosystem of people, processes and technologies that fuels a data-driven organization.
You Will Learn:
- How the data world has changed and why
- The cyclical nature of power in the data world
- Characteristics of the new analytical ecosystem
- The role of BI leaders and teams in the new world order
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Monitoring Java Application Security with JDK Tools and JFR Events
Big Data Analytics Webinar
1. Big Data Analytics:Profiling the Use of Analytic Platforms in User Organizations Wayne Eckerson Director of Research, Business Applications and Architecture Media Group TechTarget
3. Why Big Data? Changing data types Technology advances Insourcing & outsourcing Developers discover data
4. Analytics against Big Data Patterns Real-time Complex calculations Sustainable advantage
5. Framework for success Culture People Organization Architecture Analytic Platform Reporting Event-driven Data Governance BI Governance Performance Measurement IT professionals Fact-based Decisions Casual Users Analytics Analytics Center of Excellence Power Users Business Executives
6. Analytic Platforms An analytic platform is a data management system optimized for query processing and analytic that provides superior price-performance and availability compared with general purpose database management systems. Have you purchased or implemented an analytic platform as defined in this survey?
12. BI Delivery Framework 2020 Business Intelligence End-User Tools Dashboard Alerts Search, NoSQL, Java Reports and Dashboards Design Framework Universal Information Access Hadoop, Map Reduce Event detection and correlation MAD Dashboards Key-value pair indexes Architecture CEP, Streams Data Ware- housing Data Warehousing Reporting & Analysis Content Intelligence Event-Driven Alerts and Dashboards Continuous Intelligence Event-driven Analytic Sandboxes Analytic Sandboxes Ad hoc query, Spreadsheets, OLAP, Visual Analysis, Analytic Workbenches, Hadoop Ad hoc exploration Excel, Access, SAS, Visual Analysis Analytics Intelligence 12
13. Pros: -Alignment -Consistency Cons: -Hard to build -Politically charged -Hard to change - Expensive -“Schema Heavy” TOP DOWN- “Business Intelligence” Corporate Objectives and Strategy Reporting & Monitoring (Casual Users) Non-volatile data DW Architecture Predefined Metrics Reports Beget Analysis Analysis Begets Reports Pros: -Quick to build - Politically uncharged - Easy to change - Low cost Cons: -Alignment -Consistency --“Schema Light” Volatile data Ad hoc queries Analytics Architecture Analysis and Prediction (Power Users) Processes and Projects BOTTOM UP – “Analytics Intelligence”
14. BI Architecture - 2020 Operational Systems (Structured data) Operational System Extract, Transform, Load (Batch, near real-time, or real-time) Casual User Streaming/ CEP Engine Alerts Operational System Reports /Dashboards BI Server Data Warehouse Virtual Sandboxes Machine Data Dept Data Mart Hadoop Cluster Top-down Architecture Bottom-up Architecture Web Data In-memory BI Sandbox Ad hoc query Upload & query Audio/video Data Free- Standing Sandbox Query & report Ad hoc query Analytical platform or non-relational database External Data Ad hoc query Power User Documents & Text
15. BI Architecture - 2020 Operational Systems (Structured data) Operational System Extract, Transform, Load (Batch, near real-time, or real-time) Casual User Streaming/ CEP Engine Alerts Operational System Reports Dashboards BI Server Data Warehouse Virtual Sandboxes Machine Data Dept Data Mart Hadoop Cluster Top-down Architecture Bottom-up Architecture Web Data In-memory BI Sandbox Ad hoc query Upload & query Audio/video Data Free- Standing Sandbox Query & report Ad hoc query Analytical platform or non-relational database External Data Ad hoc query Power User Documents & Text
16. Recommendations Harmonize top down and bottom up BI Implement a BI architecture that supports multiple intelligences Create multiple types of analytic sandboxes Implement analytic platforms that meet business and technical requirements
Editor's Notes
Welcome to this Webcast on Big Data Analytics. My name is Wayne Eckerson, a long-time industry analyst and thought leader in the business intelligence market. I will be your speaker today. One housekeeping item before we begin. This is a prerecorded Webcast so there will be no Q&A session at the end If you have questions for me, please don’t hesitate to send me an email at weckerson@gmail.com. I’d be happy to dialogue with you about this important topic! The research and findings that I will present in this Webcast are based on a report that you can download for free from the BeyeNetwork web site or from Bitpipe. It’s a 40-page report so I hope you take the time to peruse through its details. This 60-minute webcast will present highlights from that report. First, I’ll talk about the big data analytics movement, what’s behind it, what it is, and best practices for doing it. Second, I’ll talk about big data analytics engines. I’ll explain the technology most of these engines use to turbo-charge analytical queries and then catalog vendors in the space. Third, I’ll lump analytic engines into four categories and present survey results that show what causes customers to buy each category of product. Finally, and perhaps most importantly, I’ll describe a framework for implementing big data analytics and show how to expand your existing business intelligence and data warehousing architecture to handle new requirements. So with that, let’s begin.
I’d like to thank our sponsors who made the research and this webcast possible.
There has been a lot of talk about “big data” in the past year, which I find a bit puzzling. I’ve been in the data warehousing field for more than 15 years, and data warehousing has always been about big data. So what’s new in 2011? Why are we are talking about “big data” today? There are several reasons: Changing data types. Organizations are capturing different types of data today. Until about five years ago, most data was transactional in nature, consisting of numeric data that fit easily into rows and columns of relational databases. Today, the growth in data is fueled by largely unstructured data from wWebsites as well as machine-generated data from an exploding number of sensors. Technology advances. Hardware has finally caught up with software. The exponential gains in price/-performance exhibited by computer processors, memory, and disk storage have finally made it possible to store and analyze large volumes of data at an affordable price. Organizations are storing and analyzing more data because they can.! Insourcingand outsourcing. Because of the complexity and cost of storing and analyzing Web traffic data, most organizations have outsourced these functions to third-party service bureaus. But as the size and importance of corporate e-commerce channels have increased, many are now eager to insource this data to gain greater insights about customers. At the same time, virtualization technology is making it attractive for organizations to move large-scale data processing to private hosted networks or public clouds. Developers discover data. The biggest reason for the popularity of the term “big data” is that Web and application developers have discovered the value of building a new data-intensive applications. To application developers, “big data” is new and exciting. Of course, for those of us who have made their careers in the data world, the new era of “big data” is simply another step in the evolution of data management systems that support reporting and analysis applications.
Big data by itself, regardless of the type, is worthless unless business users do something with it that delivers value to their organizations. That’s where analytics comes in. Although organizations have always run reports against data warehouses, most haven’t opened these repositories to ad hoc exploration. This is partly because analysis tools are too complex for the average user but also because the repositories often don’t contain all the data needed by the power user. But this is changing. Patterns. A valuable characteristic of ““big data”” is that it contains more patterns and interesting anomalies than “small” data. Thus, organizations can gain greater value by mining large data volumes than small ones. Fortunately, techniques already exist to mine big data thanks to companies, such as SAS Institute and SPSS (now part of IBM), that ship analyticalworkbenches. Real-time. Organizations that accumulate big data recognize quickly that they need to change the way they capture, transform, and move data from a nightly batch process to a continuous process using micro batch loads or event-driven updates. This technical constraint pays big business dividends because it makes it possible to deliver critical information to users in near-real-time. Complex analytics. In addition, during the past 15 years, the “analytical IQ” of many organizations has evolved from reporting and dashboarding to lightweight analysis. Many are now on the verge of upping their analytical IQ by implementing predictive analyticsagainst both structured and unstructured data. This type of analyticscan be used to do everything from delivered highly tailored cross-sell recommendations to predicting failure rates of aircraft engines. Sustainable advantage . At the same time, executives have recognized the power of analytics to deliver a competitive advantage, thanks to the pioneering work of thought leaders, such as Tom Davenport, who co-wrote the book,“Competing on Analytics.” In fact, forward-thinking executives recognize that analytics may be the only true source of sustainable advantage since it empowers employees at all levels of an organization with information to help them make smarter decisions.
However, the road to big data analytics is not easy and success is not guaranteed. Analytical champions are still rare. That’s because succeeding with big data analytics requires the right culture, people, organization, architecture, and technology.The right culture. Analytical organizations are championed by executives who believe in making fact-based decisions or validating intuition with data. These executives create a culture of performance measurement in which individuals and groups are held accountable for the outcomes of predefined metrics aligned with strategic objectives. The right people. You can’t do big data analytics without power users, or more specifically, business analysts, analytical modelers, and data scientists. These folks possess a rare combination of skills and knowledge: Tthey have a deep understanding of business processes and the data that sits behind those processes and are skillful in the use of various analytical tools, including Excel, SQL, analytical workbenches and coding languages. The right organization. Historically, analysts with the aforementioned skills were pooled in pockets of an organization hired by department heads. But analytical champions create a shared service organization (i.e., an analytical center of excellence) that makes analytics a pervasive competence. Analysts are still assigned to specific departments and processes, but they are also part of a central organization that provides collaboration, camaraderie, and a career path. Analytic platform. At the heart of an analytical infrastructure is an analytic platform, the underlying data management system that consumes, integrates, and provides user access to information for reporting and analysis activities. Today, many vendors, including most sponsors of this Webinar, provide specialized analytic platforms that provide dramatically better query performance than existing systems. There are many different types of analytic platforms sold by dozens of vendors.
So what is an analytic platform? It’s a data management system optimized for query processing and analytics that provides superior price-performance and availability compared with general purpose database management systems. Given this definition, 72% of our survey respondents said they already have an analytic platform. This is a surprisingly high percentage given that these platforms, except for Teradata and Sybase IQ, have only been generally available for the past five years or so. In looking at the survey responses, I did see a lot of Microsoft customers who think SQL Server fits this definition, which is doesn’t. Nevertheless, I think the results speak volumes for the power of these analytic platforms to optimize the performance of analytical applications.
Analytic platforms offer superior price-performance for many reasons. And while product architectures vary considerably, most support the following characteristics: Massively parallel processing (MPP). Most analytic platforms spread data across multiple nodes, each containing their own CPU, memory, and storage and connected to a high-speed backplane. When a user submits a query or runs an application, the “shared nothing” system divides the work across the nodes, each of which process the query on their piece of the data and ship the results to a master node that assembles the final result and sends it to the user. MPP systems are highly scalable, since you simply add nodes to increase processing power. Balanced configurations. Analytic platforms optimize the configuration of CPU, memory, and disk for query processing rather than transaction processing. Analytic appliances essentially “hard wire” this configuration into the system and don’t let customers change it, whereas analytic bundles or analytic databases (i.e., software-only solutions) allow customers to configure the underlying hardware to match unique application requirements. Storage-level processing. Netezza’s big innovation was to move some database functions, specifically data filtering functions, into the storage system using field programmable gate arrays. This storage-level filtering reduces the amount of data that the DBMS has to process, which significantly increases query performance. Many vendors have followed suit, moving various databases functions into hardware. Columnar storage and compression. Many vendors have followed the lead of Sybase, Sand Technology, Paraccel, and other columnar pioneers, by storing data in columns not rows. Since most queries ask for a subset of columns in a row rather than all rows, storing data in columns minimizes the amount of data that needs to be retrieved from disk and processed by the database, accelerating query performance. In addition, since data elements in many columns are repeated (e.g., “male” and “female” in the gender field), column-store systems can eliminate duplicates and compress data volumes significantly, sometimes as much as 10:1. This enables more data to fit into memory, which speeds processing. Memory. Many analytic platforms make liberal use of memory caches to speed query processing. Some products, such as SAP HANA and QlikTech’sQlikView, store all data in memory, while others store recently queried results in a smart cache so others who need to retrieve the same data can pull it from memory rather than from disk. Given the growing affordability of memory and the widespread deployment of 64-bit operating systems, many analytic platforms are expanding their memory footprints to speed processing.Query Optimizer. Analytic platform vendors invest a lot of time and money researching ways to enhance their query optimizers to handle various workloads. A good query optimizer is the biggest contributor query performance. In this respect, the older vendors with established products have an edge. Plug-in Analytics. True to their name, many analytic platforms offer built-in support for complex analytic. This includes complex SQL, such as correlated subqueries, as well as procedural code implemented as plug-ins to the database. Some vendors offer a library of analytical routines, from fuzzy matching algorithms to market-basket calculations. Some, like Aster Data (now owned by Teradata) provide native support for MapReduce programs that are called using SQL.
MPP Analytic Databases - Row-based databases designed to scale out on a cluster of commodity servers and run complex queries in parallel against large volumes of data. Columnar Databases - Database management systems that stores data in columns, not rows, and support high data compression ratios. Analytic Appliances - Preconfigured hardware-software system designed for query processing and analytic that requires little tuning.Analytic Bundles - Predefined hardware and software configurations that are certified to meet specific performance criteria, but the customer must purchase and configure themselves.In-memory Database - System that loads data into memory to execute complex queries. Distributed file-based systems - Designed for storing, indexing, manipulating and querying large volumes of unstructured and semi-structured dataAnalytic Services - Analytic platform delivered as a hosted or public-cloud-based service. Nonrelational databases - Optimized for querying unstructured data as well as structured data. CEP/Streaming Engines - Ingest, filter, calculate, and correlate large volumes of discrete events and apply rules that trigger alerts when conditions are met.
Our survey grouped analytic platforms into four major categories to make it easier to compare and contrast various product offerings:Analytic databasesare software-only analytic platforms that run on a variety of hardware that customers purchase. Customers install, configure and tune software, including the analytic database, before they can use the analytic system. Most MPP analytic databases, columnar databases, and in-memory databases qualify as analytic databases. As a rule of thumb, analytic databases are good for organizations that want to tune database performance for specific workloads or run the RDBMS software on a virtualized private cloud. Analytic appliances: These are hardware-software combinations designed to support ad hoc queries and other types of analytic processing. This category includes both analytic appliances and analytic bundles. As a rule of thumb, analytic appliances are fast to deploy and easy to maintain and make good replacements for Microsoft SQL Server or Oracle data warehouses that have run out of gas. They also make great standalone data marts to offload complex queries from large, maxed-out data warehousing hubs. Analytic services: Rather than deploy an analytic platform in a customer’s data center, an analytic service enables customers to house the system in an off-site hosted environment or public cloud. As a rule of thumb, analytic services are great for development, test and prototyping applications as well as for organizations that don’t have an IT department or want to outsource data center operations or get up and running very quickly. File-based analytic system: This generally refers to Hadoop, but we also lumped NoSQL or nonrelational systems into this category, although it’s not entirely accurate since nonrelational systems are databases. However, since both are used to store and analyze large volumes of unstructured data and don’t’ require an up-front schema design, they share more similarities than differences. As a rule of thumb, this category of products are ideal for processing large volumes of Web traffic and other log-based or machine-generated data.
When examining the business requirements driving purchases of analytic platforms overall, three percolate to the top: “faster queries,” “storing more data” and “reduced costs.” These requirements are followed by “more complex queries,” “higher availability” and “quicker to deploy.” This ranking is based on summing the percentages of all four deployment options for each requirement.More important, this chart shows that customers purchase each deployment option for different reasons. Analytic database customers value “quick to deploy” (46%), “built-in analytic” (43%) and “easier maintenance” (41%) more than other requirements, while analytic service customers favor “storing more data” (67%), “high availability” (67%), “reduced costs” (56%) and “more concurrent users” (56%). Not surprisingly, customers with file-based systems look for the ability to support “more diverse data” (64%) and “more flexible schemas” (64%), two hallmarks of a Hadoop/NoSQL offering. Analytic appliance customers had the most emphatic requirements. Almost two-thirds value faster queries (70%), more complex queries (64%) and faster load times (63%), suggesting that analytic appliance customers seek to offload complex ad hoc queries from data warehouses.
We also asked respondents if they were looking for a specific deployment option when evaluating products (see Figure 14). Except for customers of file-based systems, most customers investigated products across these four categories. For example, Blue Cross Blue Shield of Kansas City looked at three columnar databases (i.e., software-only) and an appliance before making a decision. Interestingly, no analytic service customers intended to subscribe to a service prior to evaluating products. That’s because many analytic-service customers subscribe to such services on a temporary basis, either to test or prototype a system or to wait until the IT department readies the hardware to house the system. Some of these customers continue with the services, recognizing that they provide a more cost-effective test and development environment than an in-house system.
Now that we’ve discussed the engines that drive big data analytics, let’s step back a bit and look at the overall framework in which they operate. I introduced this BI Delivery Framework 2020 in March. It’s basically my vision for what BI environments will look like in about 10 years. Instead of one intelligence and BI architecture to support reporting and analysis applications depicted in the middle, there will be four intelligences. Let me briefly describe each. Business Intelligence represents a classic data warehousing environment that delivers reports and dashboards primarily to casual users via a MAD framework. MAD stands for….Moving to the right, Continuous Intelligence delivers near real-time information and alerts to operational workers using event-driven architectures that handle simple and complex events. At the bottom, Analytics Intelligence enables power users to submit ad hoc queries against any data source using a variety of tools, ideally supported by analytic sandboxes built into the top-down environment. To the left, Content Intelligence makes unstructured data an equal target for reporting and analysis applications. These systems using a variety of indexing technologies to store both structured and unstructured data and allow users to submit queries against them. This is a fast growing area that encompasses Hadoop, NoSQL, and search-based technologies. If you want more information on this framework, please download my first report, titled Analytic Architectures: BI Delivery Framework 2020 from BeyeNEtwork’s Web site. But before leaving the framework, I want to drill down on business intelligence and analytics intelligence, the two most inter-related intelligences in this framework, and the two most problematic to manage synergistically.
This is another depiction of the two intelligences. As I already mentioned, Business Intelligence is a top-down environment that delivers reports and dashboards to casual users. The output is based on predefined metrics aligned with strategic goals and objectives. In other words, in a top-down environment, you know in advance what questions users want to ask and you model the environment accordingly. The benefits of this environment is that it delivers information consistency and alignment – the proverbial single version of truth. The downsides are that it’s hard to build, hard to change, costly, and politically charged. In contrast, Analytics Intelligence is the opposite. It’s a bottom-up environment geared to power users who submit ad hoc queries against a variety of sources to optimize processes and projects. This ad hoc environment is quick to build, easy to change, low cost, and politically uncharged. Yet, it creates myriad analytic silos and thus, it forfeits information alignment and consistency. The problem here is that most companies try to do all BI in either a top-down or bottom-up environment. They may start with a top-down and get discouraged that it’s expensive and not geared to ad hoc types of requests. So they abandon it in favor of analytics intelligence, which works fine for awhile until they realize they are overwhelmed with analytic silos and don’t have a common understanding of business performance. The first key here is to recognize that you need both top down and bottom up environments. They are synergistic. Analysis begets reports and reports beget analysis. You do some analysis, find something interesting, and turn it into a regularly scheduled report for others to see. But that report should trigger additional questions, which call for additional analysis, and so on. The second key is to apply the right architecture to the right tasks. Typically top down environments address 80% of your information requirements and bottom-up 20%. Yet, bottom-up may uncover 80% of your most valuable insights. Both are equally important and must be treated equivalently when building your corporate information architecture.
So here’s the architecture behind the BI Delivery Framework 2020. Let me step you through this: What’s pictured in below is the classic top-down business intelligence and data warehousing environment that most organizations have already built. …..What’s pictured in pink are new components that address the other three intelligences.To the left are new sources of data that aren’t typically loaded into DWs: Machine generated data, Web data, unstructured data and external data. In front of these sources is a Hadoop cluster, which is ideal for processing in batch large volumes of unstructured and semi-structured data, although it also can manage structured dataAtop the DW is the streaming/complex event processing engine for handling continuous intelligence and alerting.Below the DW is a free-standing database or sandbox that offloads bottom-up analytic processing from the DW, if desired. To the right and bottom is the power user, who traditionally has been left out of classic BI/DW architecture. Now, they have access to five types of analytic sandboxes designed to support ad hoc query processing as well as external data, if they have permission.
This is the same BI architecture but with the five sandboxes highlighted in green. A virtual sandbox inside the DW is a set of dedicated partitions into which analysts can upload their own data and mix it with corporate data. To avoid contention among DW resources, many companies create a free-standing sandbox to house data or users which the DW can’t support. Basically, this option offloads complex processing to a separate machine. A local in-memory BI tool can also serve as a sandbox as long as it requires analysts to publish their findings to an IT-managed server rather than proliferate spreadmarts. Hadoop is a sandbox because it allows power users who know the atomic data well and can write code to submit queries against large volumes of unstructured or structured data.Like Hadoop, a DW can be a sandbox for those power users whom IT trusts to write well-designed SQL that doesn’t bog down performance for others
So, to wrap up, I have five recommendations for supporting big data analytics: #1. Harmonize top down and bottom up BI. For too long, organizations have tried to shoehorn all types of users into a single information architecture. That has never worked. Organizations need to recognize that casual users need top-down, interactive reports and dashboards, while power users need ad hoc exploratory tools and environments. #2. Implement a BI architecture that supports multiple intelligences The BI architecture of the future supports both traditional data warehousing to handle detailed transactional data and file-based and nonrelational systems to handle unstructured and semi-structured data. It also supports continuous intelligence through CEP and streaming engines and analytical sandboxes for ad hoc exploration. #3. Create multiple types of analytic sandboxes. Analytic sandboxesbring power users more fully into the corporate data environment by enabling them to mix personal and corporate data and run complex, ad hoc queries with minimal restrictions. #4. Implement analytic platforms that meet business and technical requirements. There are four broad types of analytic platforms. Pick the one that is right for you. Appliances are quick to deploy and easy to maintain; analytic databases provide flexibility to run the software on the hardware of your choice; analytic services forego the time and cost of provisioning software in your own data center if you have one; and file-based systems are ideal for processing unstructured and semi-structured data.