This document discusses implementing an enterprise data lake to address the challenges of big data. It defines a data lake as a massive, easily accessible, flexible and scalable data repository that stores both structured and unstructured data. The data lake helps meet big data challenges by allowing organizations to store all types of data in its native format and perform analysis without needing to structure the data first. The data lake also provides capabilities that existing enterprise data warehouses lack, such as active archiving of historical data and the ability to query across all data sources.
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEMRajaraj64
As the name suggests, data lake is a large reservoir of data – structured or unstructured, fed through disparate channels. The data is fed through channels in anad-hoc manner into these data lakes, however, owing to the predefined set of rules orschema, correlation between the database is established automatically to help with the extraction of meaningful information.
For more information visit:- https://bit.ly/3lMLD1h
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
3 Mitos de Big Data revelados
Uno. Sobre el tamaño de datos: el verdadero valor está en cómo utilizamos los datos, no la cantidad de datos que tenemos.
Dos. Todas las personas necesitan acceder a la información de manera fácil y rápida. Para resolver esta necesidad requerimos alguien especializado en datos (un Data Scientist)
Tres. Existen lo que se denomina "framework de software" especiales como Hadoop. Es un sistema bueno, pero generalmente necesitamos unir información de fuentes dispares que se encuentra dispersa.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEMRajaraj64
As the name suggests, data lake is a large reservoir of data – structured or unstructured, fed through disparate channels. The data is fed through channels in anad-hoc manner into these data lakes, however, owing to the predefined set of rules orschema, correlation between the database is established automatically to help with the extraction of meaningful information.
For more information visit:- https://bit.ly/3lMLD1h
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
3 Mitos de Big Data revelados
Uno. Sobre el tamaño de datos: el verdadero valor está en cómo utilizamos los datos, no la cantidad de datos que tenemos.
Dos. Todas las personas necesitan acceder a la información de manera fácil y rápida. Para resolver esta necesidad requerimos alguien especializado en datos (un Data Scientist)
Tres. Existen lo que se denomina "framework de software" especiales como Hadoop. Es un sistema bueno, pero generalmente necesitamos unir información de fuentes dispares que se encuentra dispersa.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
For Impetus’ White Papers archive, visit- http://www.impetus.com/whitepaper
In this paper, Impetus focuses at why organizations need to design an Enterprise Data Warehouse (EDW) to support the business analytics derived from the Big Data.
This paper describes the concept of a data lake and how it compares to a data warehouse. A review recent research and discussion of the definition of both repositories, what types of data are catered for? Does ingesting data make it available for forging information and beyond
into knowledge? What types of people, process and tools need to be involved to realise the
benefits of using a data lake?
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
I have collected information for the beginners to provide an overview of big data and hadoop which will help them to understand the basics and give them a Start-Up.
Slides from May 2018 St. Louis Big Data Innovations, Data Engineering, and Analytics User Group meeting. The presentation focused on Data Modeling in Hive.
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
The pioneers in the big data space have battle scars and have learnt many of the lessons in this report the hard way. But if you are a general manger & just embarking on the big data journey, you should now have what they call the 'second mover advantage’. My hope is that this report helps you better leverage your second mover advantage. The goal here is to shed some light on the people & process issues in building a central big data analytics function
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
In many ways, moving data is like moving furniture: it's an unpleasant process dubbed an occasional necessary evil. But as the data pipelines of old decay, a new reality is taking shape: the data-native architecture. Unlike traditional data processing for BI and Analytics, this approach works on data right where it lives, thus eliminating the pain of forklifting, narrowing the margin of error, and expediting the time to business benefit. The new architecture embodies new assumptions, some of which we will talk about here.
Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain why this shift is truly tectonic. He'll be briefed by Steve Wooledge of Arcadia Data who will showcase his company's technology, which leverages a data-native architecture to fuel rapid-fire visualization and analysis of both big data and small.
This IT 812 business intelligence and data warehousing looks into the various factors including data warehousing, data mining and business intelligence as well the use and benefit of these for the modern day business organizations.
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
Companies such as Autodesk are fast replacing the once-true- and-tried physical data warehouses with logical data warehouses/ data lakes. Why? Because they are able to accomplish the same results in 1/6 th of the time and with 1/4 th of the resources.
In this webinar, Autodesk’s Platform Lead, Kurt Jackson,, will describe how they designed a modern fast data architecture as a single unified logical data warehouse/ data lake using data virtualization and contemporary big data analytics like Spark.
Logical data warehouse / data lake is a virtual abstraction layer over the physical data warehouse, big data repositories, cloud, and other enterprise applications. It unifies both structured and unstructured data in real-time to power analytical and operational use cases.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
For Impetus’ White Papers archive, visit- http://www.impetus.com/whitepaper
In this paper, Impetus focuses at why organizations need to design an Enterprise Data Warehouse (EDW) to support the business analytics derived from the Big Data.
This paper describes the concept of a data lake and how it compares to a data warehouse. A review recent research and discussion of the definition of both repositories, what types of data are catered for? Does ingesting data make it available for forging information and beyond
into knowledge? What types of people, process and tools need to be involved to realise the
benefits of using a data lake?
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
I have collected information for the beginners to provide an overview of big data and hadoop which will help them to understand the basics and give them a Start-Up.
Slides from May 2018 St. Louis Big Data Innovations, Data Engineering, and Analytics User Group meeting. The presentation focused on Data Modeling in Hive.
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
The pioneers in the big data space have battle scars and have learnt many of the lessons in this report the hard way. But if you are a general manger & just embarking on the big data journey, you should now have what they call the 'second mover advantage’. My hope is that this report helps you better leverage your second mover advantage. The goal here is to shed some light on the people & process issues in building a central big data analytics function
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
In many ways, moving data is like moving furniture: it's an unpleasant process dubbed an occasional necessary evil. But as the data pipelines of old decay, a new reality is taking shape: the data-native architecture. Unlike traditional data processing for BI and Analytics, this approach works on data right where it lives, thus eliminating the pain of forklifting, narrowing the margin of error, and expediting the time to business benefit. The new architecture embodies new assumptions, some of which we will talk about here.
Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain why this shift is truly tectonic. He'll be briefed by Steve Wooledge of Arcadia Data who will showcase his company's technology, which leverages a data-native architecture to fuel rapid-fire visualization and analysis of both big data and small.
This IT 812 business intelligence and data warehousing looks into the various factors including data warehousing, data mining and business intelligence as well the use and benefit of these for the modern day business organizations.
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
Companies such as Autodesk are fast replacing the once-true- and-tried physical data warehouses with logical data warehouses/ data lakes. Why? Because they are able to accomplish the same results in 1/6 th of the time and with 1/4 th of the resources.
In this webinar, Autodesk’s Platform Lead, Kurt Jackson,, will describe how they designed a modern fast data architecture as a single unified logical data warehouse/ data lake using data virtualization and contemporary big data analytics like Spark.
Logical data warehouse / data lake is a virtual abstraction layer over the physical data warehouse, big data repositories, cloud, and other enterprise applications. It unifies both structured and unstructured data in real-time to power analytical and operational use cases.
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
In this session, we will discuss:
* reactive architecture tenets
* distributed “fast data” streams
* application and analytics focused Data Lake
Enterprise level concerns and the importance of holistic governance, operational management, and a Metadata Lake will be conceptually investigated. The next level of detail will be to explore what a prospective architecture looks like at scale with Terabytes of ingestion per day, how scale puts pressure on an architecture, and how to be successful without losing data in a mission critical system via resilient, self-healing, scalable technologies. DevOps and application architecture concerns will be first-class themes throughout.
Reactive principles and technology will be the second act of this talk. Kafka. Akka. Spark. Various streaming technologies (Kafka Streams, Akka Streams, Spark Streaming) will be reviewed to identify what they are best suited for. The fast data pipeline discussion will center around Kafka, Akka, and Apache Flink (Lightbend Fast Data platform). We’ll also walk through an exciting addition to the Akka family, Alpakka, which is a Camel equivalent for Enterprise Integration Patterns.
The final act will be to dive into the Data Lake, from both an analytics and application development perspective. Technologies used to explain concepts will include Amazon and Hadoop. A Data Lake may service multiple analytics consumers with various “views” (and access levels) of data. It may also be a participant of various applications, perhaps by acting as a centralized source for reference data or common middleware (in turn feeding the analytics aspect). The concept of the Metadata Lake to apply structure, meaning and purpose will be an over-arching success factor for a Data Lake. The difference between the Data Lake and Metadata Lake is conceptually similar to a Halocline… Various technologies (Iglu/Snowplow and more) will be discussed from a feature standpoint to flesh out the technology capabilities needed for Data Lake governance.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Amazon Web Services
Uncovering new, valuable insights from big data requires organizations to collect, store, and analyze increasing volumes of data from multiple, often disparate sources at disparate points in time. This makes it difficult to handle big data with data warehouses or relational database management systems alone. A Data Lake allows you to store massive amounts of data in its original form, without the need to enforce a predefined schema, enabling a far more agile and flexible architecture, which makes it easier to gain new types of analytical insights from your data.
Learning Objectives:
• Introduce key architectural concepts to build a Data Lake using Amazon S3 as the storage layer
• Explore storage options and best practices to build your Data Lake on AWS
• Learn how AWS can help enable a Data Lake architecture
• Understand some of the key architectural considerations when building a Data Lake
• Hear some important Data Lake implementation considerations when using Amazon S3 as your Data Lake
"Conceptually, a data lake is a flat data store to collect data in its original form, without the need to enforce a predefined schema. Instead, new schemas or views are created “on demand”, providing a far more agile and flexible architecture while enabling new types of analytical insights. AWS provides many of the building blocks required to help organizations implement a data lake. In this session, we will introduce key concepts for a data lake and present aspects related to its implementation. We will discuss critical success factors, pitfalls to avoid as well as operational aspects such as security, governance, search, indexing and metadata management. We will also provide insight on how AWS enables a data lake architecture.
A data lake is a flat data store to collect data in its original form, without the need to enforce a predefined schema. Instead, new schemas or views are created ""on demand"", providing a far more agile and flexible architecture while enabling new types of analytical insights. AWS provides many of the building blocks required to help organizations implement a data lake. In this session, we introduce key concepts for a data lake and present aspects related to its implementation. We discuss critical success factors and pitfalls to avoid, as well as operational aspects such as security, governance, search, indexing, and metadata management. We also provide insight on how AWS enables a data lake architecture. Attendees get practical tips and recommendations to get started with their data lake implementations on AWS."
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
Hadoop provides a powerful platform for data science and analytics, where data engineers and data scientists can leverage myriad data from external and internal data sources to uncover new insight. Such power is also presenting a few new challenges. On the one hand, the business wants more and more self-service, and on the other hand IT is trying to keep up with the demand for data, while maintaining architecture and data governance standards.
In this webinar, Andrew Ahn, Data Governance Initiative Product Manager at Hortonworks, will address the gaps and offer best practices in providing end-to-end data governance in HDP. Andrew Ahn will be followed by Oliver Claude of Waterline Data, who will share a case study of how Waterline Data Inventory works with HDP in the Modern Data Architecture to automate the discovery of business and compliance metadata, data lineage, as well as data quality metrics.
We will introduce key concepts for a data lake and present aspects related to its implementation. Also discussing critical success factors, pitfalls to avoid operational aspects, and insights on how AWS enables a server-less data lake architecture.
Speaker: Sebastien Menant, Solutions Architect, Amazon Web Services
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Enterprise Data Lake:
How to Conquer the Data Deluge and Derive Insights
that Matters
Data can be traced from various consumer sources.
Managing data is one of the most serious challenges faced
by organizations today. Organizations are adopting the data
lake models because lakes provide raw data that users can
use for data experimentation and advanced analytics.
A data lake could be a merging point of new and historic
data, thereby drawing correlations across all data using
advanced analytics. A data lake can support the self-service
data practices. This can tap undiscovered business value
from various new as well as existing data sources.
Furthermore, a data lake can aid data warehousing,
analytics, data integration by modernizing. However, lakes
also face hindrances like immature governance, user skills
and security.
Using Data Lakes to Sail Through Your Sales GoalsIrshadKhan682442
Using Data Lakes to Sail Through Your Sales Goals Most Popular Busting 5 Common CRM Myths Fail-Proof Ways to Hire A-Lister in Sales Our Recommendations Retail Redefined - Where does the innovation takes us?
To know more visit here: https://www.denave.com/resources/ebooks/using-data-lakes-to-sail-through-your-sales-goals/
The volume, variety, velocity and veracity of big data are getting increasingly complex
each passing day. The way the data is stored, processed, managed and shared with
decision-makers is getting impacted by this complexity and to tackle the same, a
revolutionary approach to data management has come into picture. A data lake.
Busting 5 Common CRM Myths Most Read Fail-Proof Ways to Hire A-Listers in Sales Fail-Proof Ways to Use Data Lakes to Achieve Your Sales Goals Recommendations from Us Where does innovation lead us with respect to retail redefined?
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
This is a comprehensive guide to the most frequently asked data warehouse interview questions and answers. It covers a wide range of topics including data warehousing concepts, ETL processes, dimensional modeling, data storage, and more. The guide aims to assist job seekers, students, and professionals in preparing for data warehouse job interviews and exams.
We live in a world that is heavily dependent on technology. With the increased dependency on technology, the dependency on data has also increased. In the realm of data-driven decision making, the role of big data has transformed the landscape of data storage and analysis.
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
A whit-paper is about building a modern data platform for data driven organisations with using cloud data warehouse with modern data platform architecture
https://www.qubole.com/resources/white-papers/modern-integrated-data-environment
Polestar we hope to bring the power of data to organizations across industries helping them analyze billions of data points and data sets to provide real-time insights, and enabling them to make critical decisions to grow their business.
Optimising Data Lakes for Financial ServicesAndrew Carr
By using a data lake, you can potentially do more with your company’s data than ever before.
You can gather insights by combining previously disparate data sets, optimise your operations, and build new products. However, how you design the architecture and implementation can significantly impact the results. In this white paper, we propose a number of ways to tackle such challenges and optimise the data lake to ensure it fulfils its desired function.
Hadoop was born out of the need to process Big Data.Today data is being generated liked never before and it is becoming difficult to store and process this enormous volume and large variety of data, In order to cope this Big Data technology comes in.Today Hadoop software stack is go-to framework for large scale,data intensive storage and compute solution for Big Data Analytics Applications.The beauty of Hadoop is that it is designed to process large volume of data in clustered commodity computers work in parallel.Distributing the data that is too large across the nodes in clusters solves the problem of having too large data sets to be processed onto the single machine.
Beyond the Basics - Evolving Trends in Data Storage Strategies.pdfkelyn Technology
Today, we can say that the true future of enterprise-related data storage is mainly characterized by different points like innovation, agility, & scalability.
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
This presentation was presented at the July 8th 2014 user group meeting for BI Reporting for Bay Area Start Ups
Content - Creation Infocepts/DWApplications
Presented by: Scott Mitchell - DWApplications
How 3 trends are shaping analytics and data management Abhishek Sood
Explore how 3 current trends are shaping modern data environments and learn about the impact of non-relational databases, big data, cloud data integration, self-service analytics, and more.
How 3 trends are shaping analytics and data management
Whitepaper-The-Data-Lake-3_0
1. Implementing
the Enterprise
Data Lake
A four-stage approach to building a massive,
easily accessible, flexible and scalable Big Data repository
What is a Data Lake, and how does it help meet the challenges of Big Data? Will
the Enterprise Data Warehouse (EDW) and the Data Lake coexist? If so, how?
This paper explores what it takes to get started on the journey toward
incorporating a Data Lake into an organization’s architecture.
www.impetus.com
2. Data is like money. The world runs on it. We believe it’s valuable, and we are pretty
sure we can’t have too much of it. We save it, store it, move it around in various
formats, and use it for all kinds of purposes. We will also take it in just about any
form we can get it. We might not know how we’re going to use it, but we’re willing
to collect it now and figure out what to do with it later. For the most part, we’re
happy as long as it just keeps on coming in.
At least, that’s been true when we thought of data as finite. But, with the advent
of Big Data, it’s pouring in like never before - and while we still want it in any form
we can get it, structured or unstructured, the issues of storing, managing, and
analyzing it are becoming more complex.
Interestingly, despite all the advances of technology, the money that data most
closely resembles isn’t the conceptual kind that we refer to when we say it’s “on
paper” or that is being traded in nano-seconds in financial markets, but rather,
data is more like cold hard cash. It’s essentially a physical thing - heavy, a bit
cumbersome and hard to move. Large data transfers can take days. You don’t
want to move it very frequently, and you don’t want to move it very far. And if you
do have to move it, you’d prefer to transport it as safely as possibly, maybe via
armored truck, for example.
So what to do with it all? How do we store it? How do we manage it? How do we use
it? These are the questions that lead us to the Data Lake. But what is a Data Lake,
and how does it help meet the challenges of Big Data?
Introduction
2
Defining the Problem
Before we talk about what a Data Lake1 is, let’s define the problem and a term
or two a little more clearly. First there is unstructured data. While organizations are
amassing massive amounts of data, much of it is unstructured. Unstructured data
refers to information that either does not have a pre-defined data model or is not
organized in a pre-defined manner. And that’s a concern because if it’s not pre-
defined or structured, it’s usually difficult to analyze. And if you can’t analyze it,
what’s the point? Additionally, structuring it is a laborious, time-consuming task.
However, unstructured data accounts for much of the explosion that is Big Data. It
is also widely understood as holding the most promise for gaining new, actionable
insights.
Nearly all the data that lives outside of databases is unstructured, including
images, videos and log files produced by computers, machines and sensors. Even
this document is unstructured data. The sheer volume of it is staggering;
unstructured data makes up at least 80 percent of all digitally stored data. And as
the data-driven economy grows, the amount of unstructured data only grows.
So, what to do? Enter the Data Lake.
The power of Big Data is
the ability to correlate
data.
Real-time enterprise
data analytics are
really all about
improving decision
making.
1 “Data Lake” is one of several interchangeable common terms that could have been used here. Some others are Big
Data repository, unified data architecture, and modern data architecture.
3. 3
What is a Data Lake?
A “Data Lake” is one of several interchangeable terms that are commonly used.
Some others are Big Data repository, unified data architecture, modern data
architecture, as well as others. No matter what it is called, the concept is the same:
take the data that’s coming in -- in unstructured torrents -- and store it where it’s
more accessible, flexible and scalable and able to be analyzed without the need to
structure it. Here are two common definitions:
• A Data Lake is a massive, easily accessible, flexible and scalable
data repository
• A Data Lake is an enterprise-wide data management platform for analyzing
disparate sources of data in their native format
Data Lakes include structured, semi-structured, and unstructured data. They are
built on inexpensive computer hardware and are designed for storing
uncategorized pools of data, including the following:
• Data immediately of interest
• Data potentially of interest
• Data for which the intended usage is not yet known
The information in the Data Lake is consolidated - both structured and
unstructured - in a manner that allows inquiries to be made across the entire body
of data all at once. This ability to access all of the data is especially appealing
because the true power of Big Data is the ability to correlate insights across
previously siloed data warehouses or between structured and unstructured data
sources.
Drivers for the Data Lake
Real-time enterprise data analytics are all about improving decision making. With
so much data traditionally siloed into different data warehouses, such as
Enterprise Resource Planning (ERP), Customer Relationship Management
(CRM), Human Resource Management (HCM), and others, it’s almost impossible to
make correlations across these somewhat captive data sources. The thinking now
is to integrate data silos, build infrastructures that empower data science to
improve analytics, and reduce time to market by faster analytical processing.
These are some of the drivers behind the new architecture that is a Data Lake.
Limitations of the Current Enterprise Data Warehouse?
Why can’t we just use what we’ve always used? For the last several decades,
EDWs have served as the foundation for business intelligence and data
discovery. The world of data warehousing was a more predictable world, a world
where structures and formatting could take place in advance, where hypotheses
were drawn, where the content of data was known, and where the scope was
restricted and pre-defined. Thus, the metaphor of a warehouse
worked well because, like a warehouse, one could organize data the way one
might stock shelves.
4. 4
Limitations of the Current Enterprise Data Warehouse?
Why can’t we just use what we’ve always used? For the last several decades,
EDWs have served as the foundation for business intelligence and data
discovery. The world of data warehousing was a more predictable world, a world
where structures and formatting could take place in advance, where hypotheses
were drawn, where the content of data was known, and where the scope was
restricted and pre-defined. Thus, the metaphor of a warehouse
worked well because, like a warehouse, one could organize data the way one
might stock shelves.
In the real world of actual shelf-stocking, there are obviously some significant
constraints related to physical space and cost. For example, if you were running a
logistics company or a retail company, you’d need a physical structure as well as
shelves and floor space to store all your palettes and boxes. You’d need a plan for
what and where to store your inventory as well as labels for everything so that you
could efficiently organize the space for shipping and managing your goods.
In the world of data storage, the constraints are the same as physical
inventory: it costs to store and it costs to move. However, part of the complexity
in the realm of Big Data is that we no longer know what’s in the metaphorical
boxes, let alone how we’re going to use it. EDWs are not only costly but they are
not structured to handle the complexities of Big Data. Warehouses work when
you can define what’s in the boxes and all the associated logistics. With Big
Data, that’s no longer possible.
Thus, what makes the EDW great is also what restricts it. Data warehouses store
data in specific static structures and categories that dictate the kind of analysis
that is possible on that data. With the emergence of Big Data, this approach falls
short because it’s impossible to determine what the data might hold. And in cases
where analysis is required in real-time, formatting in advance is not an option. The
point here is that the world of data has become fluid, not static. And data is
available in such massive volumes, at near
real-time velocity and in its many unstructured forms.
Real data discovery requires that analysts are able to ask questions of the data as
train-of-thought demands. The real questions only emerge during the process of the
analysis itself which is not easily done in the EDW world.
What’s needed is an approach that allows business users to siphon off or distill the
information they need as they need it. This is the shift that underpins the business
Data Lake and which changes the game to something that better meets the needs
of today’s responsive, real-time enterprise.
Capabilities of the Data Lake
What capabilities does the Data Lake bring to the enterprise? What are the
capabilities that didn’t exist prior to the Data Lake?
Here’s our list of the top four:
5. 5
Active Archive: Providing Access to Historic Data
An active archive provides a single place to store all your data, in any format, at any
volume, indefinitely.
Enterprise data governance policies and in many cases, federal law deals with the
management of data, including how long data must be retained. An active archive
allows you to address these kinds of compliance requirements and deliver data on
demand to satisfy internal and external regulatory demands. Because it is secure,
you control who sees what; because it delivers governance and lineage services,
you can trace the access and evolution of your data over time.
Having access to historic data—both raw source information and data archived from
conventional relational stores--is extremely valuable in use cases where there are
requirements to deliver data on demand, such as health records that you need to
keep for a certain amount of time or finaincal records that need to be kept for
regulatory compliance. This capability is very useful for attaining immediate insights
instead of waiting for long drawn out processes.
Self-Service Exploratory Business Intelligence
In many ways, stored data can be the best currency an organization has to offer.
But like all other investments, this comes at a price, as organizations must
dedicate money and time to protecting their data. Users frequently want access to
enterprise data for reporting, exploration, and analysis. But production enterprise
data warehouse systems often need to be protected from casual use so they can
run the mission-critical financial and operational workloads they support.
An enterprise Data Lake allows users to explore data, with full security, using
traditional interactive business intelligence tools via SQL and keyword search.
Advanced Analytics: Far Beyond Data Sampling
A secret of many data analysis projects is that calculations are based on
representative samples of the data rather than full sets. While this works nicely if
you’re trying to determine whether an Oscar nominated film is likely to win an
Academy Award based on its popularity compared to other nominated films, what if
you are a researcher at the Center for Disease Control trying to determine the
cause of an outbreak, or an investment banker trying to measure risk, or a retailer
wanting to understand customer motivations across channels? The bottom line is
that you are much better off with the ability to search and analyze data on a large
scale and a granular level, rather than just sampling the data. Data Lakes provide
that level of fidelity.
Low Cost of Transformation: Optimizing Workloads
Extract, Transform and Load (ETL) workloads that had previously run on
expensive systems can now migrate to the enterprise Data Lake, where they run
at very low cost, in parallel, and much faster than before. Optimizing the
placement of these workloads frees capacity on high-end analytic and data
warehouse systems, making them more valuable by allowing them to
concentrate on the business critical applications that they process.
6. Adjunct to the EDW
With this new found wealth of data, we’re also experiencing a cultural shift toward
democratization of data. Leading organizations are now saying, “Here, have some.
Let’s let everybody have access and see what they can do with it.” This is due to the
growing recognition that the more an organization can harness information, the
greater the value they derive from deeper insights. For this reason, organizations
are removing blocks to innovation and transforming the way data contributes to
success.
Serving as an adjunct to the EDW, a Data Lake can:
• Work in tandem with the EDW and allow you to offload colder data to the
Data Lake.
• Allow you to work with unstructured data.
• Support a cultural shift towards democratized data access.
• Contain costs while continuing to do more with more data.
Sounds compelling, doesn’t it? But how do you know if you are ready for a Data
Lake?
Determining Readiness: Some Questions to
Ask Here are some of the critical drivers that indicate readiness:
• Are you working with a growing amount of unstructured data?
• Are your lines of business demanding even more unstructured data?
• Does your organization need a unified view of information?
• Do you need to be able to perform real-time analysis on source data?
• Is your organization moving toward a culture of democratized data access?
• Would your organization benefit from elasticity of scale?
Design Principles
If you are ready for a Data Lake, there are some key design principles that we
recommend following. Here are our top five:
• Discovery without limitations
• Low latency at any scale
• Movement from a reactive model to predictive model
• Elasticity in infrastructure
• Affordability
One of the most significant reasons to build a Data Lake is to encourage
experimentation and to move from an intuition-based model to a more
comprehensive, empirical, data science driven model. In order to enable that kind
of experimentation and analytical finesse to thrive, you have to allow for discovery
without limitations. By that, we mean you have to be willing and able to give users
access to all that data. You should also be able to perform low latency queries of
data at any scale.
Now let’s talk about what it takes to build a Data Lake.
6
7. Not a Big Bang Approach:
The Four Stages to Building a Data Lake
From our experience, building a Data Lake doesn’t happen all at once; instead,
there are stages of maturity:
1. Handling and ingesting data at scale
2. Building analytical muscle
3. Leveraging the strengths of the EDW and the Data Lake
4. Adopting broadly
1. Handling and Ingesting Data at Scale
This first stage involves getting the architecture in place and learning to acquire and
transform data at scale. This is when your organization will need to determine the
new and existing data sources that it can leverage. These data sources are then
integrated and the volume and variety of data is ingested at high velocity in Hadoop
storage. At this stage, the analytics may be rather simple, consisting possibly of
simple transformations, but it’s an important step in discovering how to make
Hadoop work the way you want.
2. Building Analytical Muscle
The second stage focuses on improving the ability to transform and analyze data.
This is where you begin to really leverage the enterprise Data Lake. For example,
your organization can start building batch, mini-batch, and real-time applications
for enterprise usage, exploratory analytics, and predictive use cases. Various tools
and frameworks are used at this stage. The EDW and the Data Lake start working
together.
3. Leveraging the Strengths of the EDW and the Data Lake
This is when the orchestra really starts to play. Here, in the third stage, you will
want to get data and analytics into the hands of as many people in the
organization as possible. Democratization begins. This is also the stage where the
EDW and Hadoop-based Big Data lake truly co-exist, allowing the enterprise to
leverage the strengths of each architecture.
4. Adopting Broadly
The fourth level is the highest stage of maturity in the Data Lake. Enterprise
capabilities are added to the Data Lake. Broad adoption of unified Data Lake
architectures requires information governance, compliance, security, auditing,
meta-data management and information lifecycle management capabilities. Not
addressing these issues may result in slow enterprise adoption and runs the risk
that eventually the Data Lake becomes a “data swamp.”
The Big Data Lake: Understanding the Essentials
Understanding the layers of the data warehouse is an imperative step in the Big
Data journey. The following pages elucidate the components of a Big Data
warehouse and the methodology to set it up.
Components of Big Data Warehouse
While requirements and specific business needs may vary within each
organization, the following diagram lists the major components of a Big
Data warehouse. 7
8. 8
Data Sources
An enterprise usually has the following sources of data:
• A relational database such as Oracle, DB2, PostgreSQL, SQL Server and
the like
• Multiple disparate, unstructured and semi-structured data sources which
may have data in formats such as flat files, XML, JSON or CSV
• Existing systems may further provide integration data in EDI or other B2B
exchange formats
• Machine data and network elements generating huge volumes of data
Hadoop Distribution
Hadoop is the most popular choice for Big Data today and is available in open
source Apache and commercial distribution packages. Hadoop consists of a file
system called HDFS (Hadoop distributed file system) which forms the key data
storage layer of the Big Data warehouse. Other options are also available such as
GPFS (from IBM) and S3 (from Amazon cloud).
Data Ingestion
It is imperative to set up reliable and scalable data ingestion mechanisms to
bring data in from data sources to the Hadoop file system.
• For connecting relational database, the most popular option is Sqoop and
database specific connectors
• For streaming data, Apache Kafka and Flume are quite popular
Figure 1: Big Data warehouse reference architecture by Impetus
Relational Data
(PostgreSQL,
Oracle, DB2,
SQL Server...)
Flat Files/XML
/JSON/CSV
Exsiting
Systems
Machine Data/
Network
Elements
Data Sources
Business
Intelligence
Machine Data
Analysis
Predictive
& Statistical
Analysis
Data
Discovery
Visualization
& reporting
Kafka/Flume
Sqoop/
Connectors
Existing
DI Tools
REST | JDBC
SOAP | Custom
Streaming
Data Ingestion
Virtualization
Federation
Delivery
Polyglot Mapper
Management
Provision
Monitor
Performance
Optimization
Security
Data Quality
Lifecycle
Management
Data
Classification
Information
Policy
Governance
Data Query
Relational Offload
Engine (Hive/Pig/Drill/Spark
SQL)
Search
Pipelines
Cubes
Access
Data Store/
NoSQL
9. 9
• Organizations that need to leverage streaming data sources to setup an entire
topology of streaming source, ingestion, in flight transformation and data
persistence would need to use one of the common CEP (Complex Event
Processing) or streaming engines such as Apache Storm or StreamAnalytix.
• Organizations that need to leverage their existing Data Integration (DI)
connectors may need custom scripts to integrate using REST, SOAP or JDBC
components.
Data Query:
For the data resident in HDFS, there are a multitude of query engines such as Pig,
Hive, Apache Drill and Spark SQL.
Many organizations, however, would prefer to re-use their SQL scripts and
procedures written for their traditional enterprise data warehouse. Because they
have already invested millions of dollars in the traditional SQL and PL/SQL engines,
it is understandable that organizations want to explore mechanisms that will allow
them to offload the data tables from relational data warehouses to a Big Data
warehouse while keeping their querying/reporting scripts intact.
Tools and solutions are available now from organizations like Impetus
Technologies which help the enterprise to offload the expensive computing from
relational data warehouses to Big Data warehouses without re-writing the entire
processing layer.
Data Stores
Along with the HDFS, there is a trend to couple a data store or NoSQL database
like HBase, Cassandra in the Big Data warehouse. These stores provides
additional functions in the form of columnar database, schema less storage,
querying, OLAP/OLTP and application integration.
Access
With data stored in the HDFS or NoSQL layer, organizations demand increasingly
complex access requirements. These include features from the traditional world
like search and cubes functions. There are also new tools which help manage
complex pipelines of jobs where output of one query may be fed as input into
another.
Governance
Ensuring data quality is the key reason for data governance in the Big Data
warehouse.
• While the aim of the Big Data warehouse is to offer a Data Lake integrated
with all enterprise data sources, it is still essential to apply data quality
regulations to ensure the Data Lake does not turn into a data swamp
• Similarly, data users increasingly need to make sure that they are able to
manage the data through its entire lifecycle
• Classifying data based on various segments like business user group (for
instance, marketing, risk management, operations etc.) ensures control and
governance of the data
• It is also imperative to define enterprise level information policies to avoid
breaches and ensure control on the entire data warehouse
10. 10
Stages for Setting up a Big Data Warehouse
The journey to a Big Data warehouse is a multi-stage process. It requires selecting
the right tools, keeping a clear vision, and following a process to lay out an
effective and integrated data warehouse. The key stages for setting up Big Data
warehouse broadly include the following:
Stage One: Handle and Ingest Data at Scale
As the first stage, the organization needs to determine the existing and new
data source that it can leverage.
• The data sources are integrated and the variety of voluminous data is
ingested at high velocity in Hadoop storage
• The incoming data may be of varied formats ranging from unstructured data,
structured data, streaming data, machine geo-spatial time series data or
external data sets like social media data
Virtualization
Organizations have found that despite their best intentions and use cases, they
may have to deal with the coexistence of the enterprise data warehouse and the
Big Data warehouse for a period of time. To ensure consistent results with
appropriate polyglot querying, the federation of data and delivery mechanisms are
essential.
Management
To provision and monitor the entire cluster, operations team need to have handy
tools and dashboards for cluster management. It is not un-common to find
engineers trying to diagnose performance of MapReduce jobs and queries in their
quest for optimal speed and minimal resource consumption. Security is another
key aspect of the warehouse with authentication and role based authorization
behind defined gateways.
Business Intelligence
The goal of the warehouse is to achieve business insights and generate
intelligence for the organization. To achieve that objective, business teams
need to be empowered with various visualization and reporting tools. Data
scientists can also help discover data patterns using predictive/statistical
algorithms and machine data analytics.
Figure 2: handle and ingest data at scale
Streaming
Unstructured
Structured
Machine
Geospatial Time
Series
External Social
Landing and
Ingestion
Big Data
Storage
11. 11
Stage Two: Build the Analytical Muscle
In order to leverage the enterprise Data Lake in Hadoop, the organization builds
batch, mini-batch and real time applications for enterprise usage, exploratory
analytics and predictive use cases. Various tools and frameworks are utilized in
this stage as organizations begin to:
• Explore advanced querying engines starting with MapReduce, and moving onto
Apache Spark, Flink etc. for interactive results
• Build use cases for both batch and real time processing using streaming
solutions like Apache Storm and StreamAnalytix
• Build analytic applications for enterprise adoption, exploration, discovery and
prediction
Step Three: Enterprise Data Warehouse and Big Data Warehouse Work in
Unison
In a real world scenario, enterprise data warehouse (EDW) and Hadoop based
data warehouse (BDW) would co-exist as follows:
• The organization would leverage data and specific capabilities to its
advantage
• Rather than disposing of the expensive enterprise warehouse, organizations like
to leverage that along with Big Data technologies
• Once a stable and mature Big Data warehouse is achieved, both the EDW and
BDW work in unison to achieve multi work-load distribution and offload to each
other as required
• Specialized solutions like Impetus relational offload solution aid in
helping organizations save millions of dollars with superior time and
schedule benefits
Figure 3: Build the analytical muscle
Streaming
Unstructured
Structured
Machine
Geospatial Time
Series
External Social
Landing and
Ingestion
Big Data
Storage
Real-Time Applications
Provisioning,Workflow, Monitoring and Security
Enterprise
Applications
Exploration
& Discovery
Predictive
Applications
12. 12
Step Four: Achieving Enterprise Maturity in the Warehouse
For a unified data warehouse, various enterprise ready capabilities are needed.
These are particularly pertinent in the case of information governance, metadata
management and information lifecycle management.
While organizations may begin with basic governance paradigms, as they mature in
the journey it becomes essential to have more sophisticated practices and policies.
Further, the appetite of the user is no longer satisfied by simply exploring data and
managing it through its lifespan. Instead, organizations need tools and utilities to
handle the laborious tasks of discovering data, proving insights and managing the
information lifecycle.
Figure 4: Enterprise data warehouse and Big Data warehouse work in unison
Streaming
Unstructured
Structured
Machine
Geospatial Time
Series
External Social
Enterprise
Applications
Exploration
& Discovery
Predictive
Applications
Landing and
Ingestion
Big Data
Storage
Real-Time Applications
Provisioning, Workflow, Monitoring and Security
Traditional
Data
Repositories
RDBMS MPP
Figure 5: achieving enterprise maturity in the warehouse
Streaming
Unstructured
Structured
Machine
Geospatial Time
Series
External Social
Enterprise
Applications
Exploration
& Discovery
Predictive
Applications
Landing and
Ingestion
Big Data
Storage
Real-Time Applications
Provisioning, Workflow, Monitoring and Security
Traditional
Data
Repositories
RDBMS MPP
Governance, Information Lifecycle, Enterprise Meta Data Management