Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations.
In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
This presentation is geared toward enterprise architects and senior IT leaders looking to drive more value from their data by learning about cloud data lake management.
As businesses focus on leveraging big data to drive digital transformation, technology leaders are struggling to keep pace with the high volume of data coming in at high speed and rapidly evolving technologies. What's needed is an approach that helps you turn petabytes into profit.
Cloud data lakes and cloud data warehouses have emerged as a popular architectural pattern to support next-generation analytics. Informatica's comprehensive AI-driven cloud data lake management solution natively ingests, streams, integrates, cleanses, governs, protects and processes big data workloads in multi-cloud environments.
Please leave any questions or comments below.
Introduces the Microsoft’s Data Platform for on premise and cloud. Challenges businesses are facing with data and sources of data. Understand about Evolution of Database Systems in the modern world and what business are doing with their data and what their new needs are with respect to changing industry landscapes.
Dive into the Opportunities available for businesses and industry verticals: the ones which are identified already and the ones which are not explored yet.
Understand the Microsoft’s Cloud vision and what is Microsoft’s Azure platform is offering, for Infrastructure as a Service or Platform as a Service for you to build your own offerings.
Introduce and demo some of the Real World Scenarios/Case Studies where Businesses have used the Cloud/Azure for creating New and Innovative solutions to unlock these potentials.
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
Say goodbye to data silos! Analytics in a Day will simplify and accelerate your journey towards the modern data warehouse. Join CCG and Microsoft for a half-day virtual workshop, hosted by James McAuliffe.
Enterprise 360 - Graphs at the Center of a Data FabricPrecisely
Data fabric architectures are used to simplify and integrate data management across business functions to accelerate digital transformation. Creating a data fabric is a way to develop a data-centric view of your business which results in an Enterprise 360 perspective based on trusted data.
Industry analysts and vendors are increasingly finding that graph databases are a key enabling technology in support of
Data Fabric architectures that deliver trusted data.
During this on-demand webinar, we discuss how we help our customers implement a Data Fabric pattern using graph database technology in support of their key strategic objectives.
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/Bvmvc9
Data prep and data blending are terms that have come to prominence over the last year or two. On the surface, they appear to offer functionality similar to data virtualization…but there are important differences!
In this session, you will learn:
• How data virtualization complements or contrasts technologies such as data prep and data blending
• Pros and cons of functionality provided by data prep, data catalog and data blending tools
• When and how to use these different technologies to be most effective
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Data Quality in the Data Hub with RedPointGlobalCaserta
At a Big Data Warehousing Meetup, George Corugedo, CTO of RedPoint Global demonstrated how to use your big data platform for data integration, data quality and identity resolution to provide a true 360 degree view of your customer on Hadoop using the RedPoint product.
For more information or questions, please contact us at www.casertaconcepts.com.
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
Abstract:- We have created variety of Analytics Solutions combining data from our Data Lake with Traditional DW. Data API's which are fed into product for improving conversions, Churn prediction alogrithm to help account managers focus on high risk customers, using analytics as an edge to empower sales team to win prospective customers.
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
Ajit Jaokar, Data Science for IoT professor at Oxford University “Enterprise ...Dataconomy Media
“Enterprise AI - Artificial Intelligence for the Enterprise."
AI is impacting many areas today. This talk discusses how AI will impact the Enterprise and what it means in the near future. The talk is based on my course I teach at the University of Oxford.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
How do you turn data from many different sources into actionable insights and manufacture those insights into innovative information-based products and services?
Industry leaders are accomplishing this by adding Hadoop as a critical component in their modern data architecture to build a data lake. A data lake collects and stores data across a wide variety of channels including social media, clickstream data, server logs, customer transactions and interactions, videos, and sensor data from equipment in the field. A data lake cost-effectively scales to collect and retain massive amounts of data over time, and convert all this data into actionable information that can transform your business.
Join Hortonworks and Informatica as we discuss:
- What is a data lake?
- The modern data architecture for a data lake
- How Hadoop fits into the modern data architecture
- Innovative use-cases for a data lake
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
This presentation is geared toward enterprise architects and senior IT leaders looking to drive more value from their data by learning about cloud data lake management.
As businesses focus on leveraging big data to drive digital transformation, technology leaders are struggling to keep pace with the high volume of data coming in at high speed and rapidly evolving technologies. What's needed is an approach that helps you turn petabytes into profit.
Cloud data lakes and cloud data warehouses have emerged as a popular architectural pattern to support next-generation analytics. Informatica's comprehensive AI-driven cloud data lake management solution natively ingests, streams, integrates, cleanses, governs, protects and processes big data workloads in multi-cloud environments.
Please leave any questions or comments below.
Introduces the Microsoft’s Data Platform for on premise and cloud. Challenges businesses are facing with data and sources of data. Understand about Evolution of Database Systems in the modern world and what business are doing with their data and what their new needs are with respect to changing industry landscapes.
Dive into the Opportunities available for businesses and industry verticals: the ones which are identified already and the ones which are not explored yet.
Understand the Microsoft’s Cloud vision and what is Microsoft’s Azure platform is offering, for Infrastructure as a Service or Platform as a Service for you to build your own offerings.
Introduce and demo some of the Real World Scenarios/Case Studies where Businesses have used the Cloud/Azure for creating New and Innovative solutions to unlock these potentials.
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
Say goodbye to data silos! Analytics in a Day will simplify and accelerate your journey towards the modern data warehouse. Join CCG and Microsoft for a half-day virtual workshop, hosted by James McAuliffe.
Enterprise 360 - Graphs at the Center of a Data FabricPrecisely
Data fabric architectures are used to simplify and integrate data management across business functions to accelerate digital transformation. Creating a data fabric is a way to develop a data-centric view of your business which results in an Enterprise 360 perspective based on trusted data.
Industry analysts and vendors are increasingly finding that graph databases are a key enabling technology in support of
Data Fabric architectures that deliver trusted data.
During this on-demand webinar, we discuss how we help our customers implement a Data Fabric pattern using graph database technology in support of their key strategic objectives.
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/Bvmvc9
Data prep and data blending are terms that have come to prominence over the last year or two. On the surface, they appear to offer functionality similar to data virtualization…but there are important differences!
In this session, you will learn:
• How data virtualization complements or contrasts technologies such as data prep and data blending
• Pros and cons of functionality provided by data prep, data catalog and data blending tools
• When and how to use these different technologies to be most effective
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Data Quality in the Data Hub with RedPointGlobalCaserta
At a Big Data Warehousing Meetup, George Corugedo, CTO of RedPoint Global demonstrated how to use your big data platform for data integration, data quality and identity resolution to provide a true 360 degree view of your customer on Hadoop using the RedPoint product.
For more information or questions, please contact us at www.casertaconcepts.com.
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
Abstract:- We have created variety of Analytics Solutions combining data from our Data Lake with Traditional DW. Data API's which are fed into product for improving conversions, Churn prediction alogrithm to help account managers focus on high risk customers, using analytics as an edge to empower sales team to win prospective customers.
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
Ajit Jaokar, Data Science for IoT professor at Oxford University “Enterprise ...Dataconomy Media
“Enterprise AI - Artificial Intelligence for the Enterprise."
AI is impacting many areas today. This talk discusses how AI will impact the Enterprise and what it means in the near future. The talk is based on my course I teach at the University of Oxford.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
"Startupbootcamp and Data startups", Angel Garcia, Co-Founder and Tech Mentor...Dataconomy Media
"Startupbootcamp and Data startups", Angel Garcia, Co-Founder and Tech Mentor at Startupbootcamp IoT & Data in Barcelona
Watch more from Data Natives Tel Aviv 2016 here: http://bit.ly/2hw1MY0
Visit the conference website to learn more: http://telaviv.datanatives.io/
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Currently is Founding Partner at Startupbootcamp Internet of Things & Data and Lanta Digital Ventures. Startupbootcamp is Europe’s first, global and leading accelerator program for startups and Lanta is an early stage venture capital fund that invests in innovative start-ups. Angel is an experienced executive, entrepreneur and investor. He has broad experience over more than 15 years acting in an international environments both in Asia and US building up a startup project. Angel is shareholder at Fractus, a well-known growing European Technology start-up in the global telecom industry which is currently implementing a patent licensing program having collected close to $ 100 million in royalties to this day from top worldwide cellphone manufacturers.
Artificial Intelligence and The Future of Trust - Stéphane BuraWithTheBest
AI is fundamentally changing the way we interact with our devices. AI has a trust issue. AI is opening the black box. Feedback & AI is an interaction partner filled with intention-based interactions. It is the future and the economy of trust.
Stéphane Bura, Chief Product Officer, Weave.ai
Overview of Blue Medora - New Relic Plugin for HP Blade ServersBlue Medora
Overview of Blue Medora's New Relic Plugin for HP Blade Servers. The Blue Medora New Relic Plugin for HP Blade Servers provides support for New Relic Plugins as well as New Relic Insights.
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma, thought leader and coauthor of Architecting Data Lakes, offers lessons learned from the field to get you started.
Qiagram is a collaborative visual data exploration environment that enables investigator-initiated, hypothesis-driven data exploration, allowing business users as well as IT professionals to easily ask complex questions against complex data sets.
Balancing data democratization with comprehensive information governance: bui...DataWorks Summit
If information is the new oil, then governance is its “safety data sheet.” As demand for data as the raw material for competitive differentiation continues to rise in enterprises, enterprises are having bigger challenges identifying and valuing data and ensuring its appropriate use to extract the right information. In order for organizations to make effective business decisions, organizations need to have trust in their data so that they can impute the right value and use it for the right purposes while satisfying any organizational or regulatory mandates. A number of analytics and data science initiatives fail to reach their potential due to lack of an information governance framework in place. Robust information governance capabilities can help organizations develop trust in their data and empower them to make decisions confidently.
In this session Sanjeev Mohan, Research Analyst at Gartner, and Srikanth Venkat, Sr. Director of Product Management at Hortonworks, will walk you through an end-to-end architectural blueprint for information governance and best practices for helping organizations understand, secure, and govern diverse types of data in enterprise data lakes.
Speaker
Sanjeev Mohan, Gartner, Research Analyst
Srikanth Venkat, Hortonworks, Senior Director, Product Management
In this session Sanjeev Mohan, Research Analyst at Gartner, and Srikanth Venkat, Sr. Director of Product Management at Hortonworks, will walk you through an end-to-end architectural blueprint for information governance and best practices for helping organizations understand, secure, and govern diverse types of data in enterprise data lakes.
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo
Watch full webinar here: https://buff.ly/46pRfV7
This Denodo session explores the power of data virtualization, shedding light on its architecture, customer value, and a diverse range of use cases. Attendees will discover how the Denodo Platform enables seamless connectivity to various data sources while effortlessly combining, cleansing, and delivering data through 5 differentiated use cases.
Architecture: Delve into the core architecture of the Denodo Platform and learn how it empowers organizations to create a unified virtual data layer. Understand how data is accessed, integrated, and delivered in a real-time, agile manner.
Value for the Customer: Explore the tangible benefits that Denodo offers to its customers. From cost savings to improved decision-making, discover how the Denodo Platform helps organizations derive maximum value from their data assets.
Five Different Use Cases: Uncover five real-world use cases where Denodo's data virtualization platform has made a significant impact. From data governance to analytics, Denodo proves its versatility across a variety of domains.
- Logical Data Fabric
- Self Service Analytics
- Data Governance
- 360 degree of Entities
- Hybrid/Multi-Cloud Integration
Watch this illuminating session to gain insights into the transformative capabilities of the Denodo Platform.
Building a New Platform for Customer Analytics Caserta
Caserta Concepts and Databricks partner up to bring you this insightful webinar on how a business can choose from all of the emerging big data technologies to figure out which one best fits their needs.
AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...Amazon Web Services
Andrew McIntyre, Director of Strategic ISV Alliances, Informatica
Modernizing your analytics capabilities to deliver rapid new insights is critical to successfully drive data-driven digital transformation. Many organizations find it challenging to connect, understand and deliver the right data to generate new insights. Learn about the latest patterns, solutions and benefits of Informatica's next-generation Enterprise Data Management platform to unleash the power of your data through the modern cloud data infrastructure of AWS. See how you can accelerate AI-driven next-generation analytics by cataloging and integrating structured and unstructured data from hundreds of data sources from multiple on-premises and cloud data sources.
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
Imagine a fast, more efficient business thriving on trusted data-driven decisions. An intelligent data catalog can help your organization discover, organize, and inventory all data assets across the org and democratize data with the right balance of governance and flexibility. Informatica's data catalog tools are powered by AI and can automate tedious data management tasks and offer immediate recommendations based on derived business intelligence. We offer data catalog workshops globally. Visit Informatica.com to attend one near you.
Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Data Virtualization: Introduction and Business Value (UK)Denodo
Watch full webinar here: https://bit.ly/30mHuYH
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics. Denodo’s vision is to provide a unified data delivery layer as a logical data fabric, to bridge the gap between the IT and the business, hiding the underlying complexity and creating a semantic layer to expose data in a business friendly manner.
Attend this webinar to learn:
- What data virtualization really is
- How it differs from other enterprise data integration technologies
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations
- Business Value of data virtualization and customer use cases
- Highlights of the newly launched Denodo Platform 8.0
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
DataLakes kan skalere i takt med skyen, nedbryde integrationsbarrierer og data gemt i siloer og bane vejen for nye forretningsmuligheder. Det er alt sammen med til at give et bedre beslutningsgrundlag for ledelse og medarbejdere. Kom og hør hvordan.
David Bojsen, Arkitekt, Microsoft
Data Virtualization. An Introduction (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3uiXVoC
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Watch on-demand this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise? Where does it fit..?
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
The challenges of increasing complexity of organizations, companies and projects are obvious and omnipresent. Everywhere there are connections and dependencies that are often not adequately managed or not considered at all because of a lack of technology or expertise to uncover and leverage the relationships in data and information. In his presentation, Axel Morgner talks about graph technology and knowledge graphs as indispensable building blocks for successful companies.
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Dataconomy Media
Every day we are challenged with more data, more use cases and an ever increasing demand for analytics. In this talk Bjorn will explain how autonomous data management and machine learning help innovators to more productive and give examples how to deliver new data driven projects with less risk at lower costs.
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
Compliance departments within banks and other financial institutions are turning to machine learning for improving their Anti Money Laundering compliance activities. Today, the systems that aim to detect potentially suspicious activity are commonly rule-based, and suffer from ultra-high false positive rates. DataRobot will discuss how their Automated Machine Learning platform was successfully used for a real use case to reduce their false positives and to enhance their Anti-Money Laundering activities.
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
Trump, Brexit, Cambridge Analytica... In the last few years, we have had to confront the consequences of the use and misuse of data science algorithms in manipulating public opinion through social media. The use of private data to microtarget individuals is a daily practice (and a trillion-dollar industry), which has serious side-effects when the selling product is your political ideology. How can we cope with this new scenario?
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Dataconomy Media
When taking a deep dive into the world of data, one thing is certain: the ultimate goal is to create something new, something better, something faster. In other words, innovation should always be at the forefront of companies strategic outlook, whether their goal is to pioneer new processes, user experiences, products or services.
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...Dataconomy Media
What does it take to build a good data product or service? Data practitioners always think about the technology, user experience and commercial viability. But rarely do they think about the implications of the systems they build. This talk will shed light on the impact of AI systems and the unintended consequences of the use of data in different products. It will also discuss our role, as data practitioners, in planting the seeds of fairness in the systems we build.
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Dataconomy Media
We all hear about the power of data, big data and data analysis in todays market place. But rarely feel it's touchable effects on our own business decisions and performance.
Let's dive into it and see how can people analytics increase people performance, motivation and business revenue?
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Dataconomy Media
In the data industry, having correctly labelled datasets is vital. Timothy Thatcher explains how tagging your data while considering time and location and complex hierarchical rules at scale can be handled.
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Dataconomy Media
During the lifetime of an A/B test product managers and analysts in GetYourGuide require various tools and different kinds of data to plan the trial properly, control it during the run and analyze the results at the end. This talk would be about the architecture, tools and data flow for serving their needs.
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Dataconomy Media
Creativity is the mental ability to create new ideas and designs. Innovation, on the other hand, Means developing useful solutions from new ideas. Creativity can be goal-oriented, Whereas innovation is always goal-oriented. This bedeutet, dass innovation aims to achieve defined goals. The use of cloud services and technologies promises enterprise users many benefits in terms of more flexible use of IT resources and faster access to innovative solutions. That’s why we want to examine the question in this talk, of what role cloud computing plays for innovation in companies.
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Dataconomy Media
Presentation of Time Series Properties of Financial Instrument and Possibilities in Frequency Decomposition and Information Extraction using FT, STFT and Wavelets with Outlook in Current Research on Wavelet Neural Networks
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Dataconomy Media
"With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data for ETL, and hours to train models. It's also hard to scale, with data sets increasingly being larger than the capacity of any single server. The amount of the data also makes it hard to incrementally test and retrain models in near real-time.
Learn how Apache Ignite and GridGain help to address limitations like ETL costs, scaling issues and Time-To-Market for the new models and help achieve near-real-time, continuous learning.
Yuriy Babak, the head of ML/DL framework development at GridGain and Apache Ignite committer, will explain how ML/DL work with Apache Ignite, and how to get started.
Topics include:
— Overview of distributed ML/DL including architecture, implementation, usage patterns, pros and cons
— Overview of Apache Ignite ML/DL, including built-in ML/DL algorithms, and how to implement your own
— Model inference with Apache Ignite, including how to train models with other libraries, like Apache Spark, and deploy them in Ignite
— How Apache Ignite and TensorFlow can be used together to build distributed DL model training and inference"
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
"Machine learning algorithms require significant amounts of training data which has been centralized on one machine or in a datacenter so far. For numerous applications, such need of collecting data can be extremely privacy-invasive. Recent advancements in AI research approach this issue by a new paradigm of training AI models, i.e., Federated Learning.
In federated learning, edge devices (phones, computers, cars etc.) collaboratively learn a shared AI model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. From personal data perspective, this paradigm enables a way of training a model on the device without directly inspecting users’ data on a server. This talk will pinpoint several examples of AI applications benefiting from federated learning and the likely future of privacy-aware systems."
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"
1. Informatica Intelligent Data Lake
Self Service for Data Analysts
Februar, 2017
Sören Eickhoff
Sales Consultant Central Europe
SEickhoff@informatica.com
3. Data Platform
Data Lake
Use Case: Data Lake / Data Platform Reference Architecture
Landing Zone
Structured and unstructured enterprise and external data is landed in its raw form,
normalized and ready for use
Data AnalystData Scientist BusinessData StewardData Modeler Data Engineer
Discovery Zone
User sandbox for self-serve access to data for exploration, data blending, hypothesis
testing, analytics, and collaboration
Production Zone
Sanitized transactional, master, and reference data & enriched data models certified for
enterprise use
Machine
Device, Cloud
Documents
and Emails
Relational,
Mainframe
Social Media,
Web Logs Improve
Predictive
Maintenance
Increase
Operational
Efficiency
Increase
Customer
Loyalty
Reduce
Security Risk
Improve
Fraud
Detection
4. • Can’t easily find trusted data
• Limited access to the data
• Frustrated by slow response from IT
due to long backlog
• Constrained by disparate desktop
tools, manual steps
• No way to collaborate, share, and
update curated datasets
• Can’t cope with growing demand
from the business
• No visibility into what the business
is doing with the data
• Struggling to deliver value to the
business
• Loosing the ability to govern and
manage data as an asset
Challenges Faced by the Business and IT Today
ITData Analysts
5. Informatica Data Lake Management
Data Lake Management
Enterprise
Information
Catalog
Intelligent
Data Lake
Secure@Source
TITAN
Blaze
Big Data
Management
Intelligent
Streaming
Live Data Map
(metadata integration)
Big Data Management
(data integration)
Data Architect /
Steward
Data Scientist /
Analyst
InfoSec Analyst Data Engineer
6. Unified view into enterprise information assets
• Business-user oriented solution
• Semantic search with dynamic facets
• Detailed Lineage and Impact Analysis
• Business Glossary Integration
• Relationships discovery
• High level data profiling
• Automatic Classifications with Data domains
• Business classifications with Custom Attributes
• Broad metadata source connectivity
• Big data scale
Enterprise Information Catalog
7. Self-service data preparation with collaborative data governance
• Collaborative project workspaces
• Automated data ingestion
• Search data asset catalog
• Rapid blend of datasets
• Crowd-sourced data asset, tagging & data
sharing
• Automated data asset discovery &
Recommendations
• Rapid ‘industrialization’ of preparation
steps into re-usable workflows
• Complete tracking of usage, lineage, and
security
• Easily support Data Discovery Platforms
Intelligent Data Lake
8. Enterprise-wide visibility into sensitive data risks
• Sensitive data classification & discovery
• Sensitive data proliferation analysis
• Who has access to sensitive data
• User activity on sensitive data
• Sensitive Data policy-based alerting
• Multi-factor risk scoring
• Identification of highest risk areas
• Integrates data security information from 3rd parties:
Data stores, owner, classification
Protection status
User access info (LDAP, IAM) and activity logs
(DB, Hadoop, Salesforce, DAM)
Secure@Source
9. Easily integrate more data faster from more data sources
Big Data Management
Smart Executor
Informatica Big Data Management
ETL/DI
Servers
Informatica Data
Transformation
Engine on
dedicated DI
servers
Data
Connectivity
Data
Integration
Data
Masking
Data
Quality
Data
Governance
YARN
HDFS
Map
Reduce
Hive on
Map
Reduce
Tez
Spark
Core
Cluster
Aware
Hive
On
Tez
Spark Blaze
Hadoop Cluster
• Visual development interface accelerates
developer productivity
• Near universal data connectivity
• Complex data parsing on Hadoop
• Data profiling on Hadoop
• High-speed data ingestion and extraction
• Process and deliver data at scale on
Hadoop
• Dynamic schemas and mapping
templates
• Data Quality and Data Governance on
Hadoop
10. Take Big Data Management to the Next Level
Improving developer productivity – Dynamic Mappings Re-use PowerCenter & SQL Logic
Automatically profit from new technologies and choose best option - Smart Optimizer
MapReduce
Spark
Blaze
Generic source Generic targetRule based logic
11. Informatica Intelligent Streaming
• Streaming analytics capability
into the Intelligent Data Platform
• Unified UI with multiple engines
underneath the covers
• Frictionless integration
conversion/extension of batch
mappings into streaming context
• Abstracted from runtime
framework
Collect, ingest and process data in realtime and streaming
Realtime
source
Realtime
target
Window
transformation
Spark
Streaming
code generated
14. How?
Applications &
Databases
Internet of Things
3rd Party Data
Data Modeling
Tools BI Tools CustomCloud
Data Access & Metadata Connectivity
Intelligent Metadata FoundationCatalog ClassifyIndex
Data
Lineage
Data
Relationships
Smart
Domains
Data
Profile
Data Discovery & Analysis Process
Recommend
Discover Collaborate
Publish
Operationalize/
Monitor
Prepare
Data
Analyst / Scientist
Intelligent Data Lake
15. Data Asset
- Data you work with as a unit
Project
- A project contains
data assets and worksheets.
Recipe
- The steps taken to prepare
data in a worksheet.
Data Publication
- the process of making prepared
data available in the data lake
Data Preparation
- The process of combining, cleansing,
transforming, and structuring data from one
or more data assets so that it is ready
for analysis.
Terminology
Intelligent Data Lake
16. Search and Discovery
Data discovery through a powerful search engine to find relevant data
Semantic
search
Fact filtering by
asset, resource Type,
latest , size, custom
attributes…
17. Data Asset Overview
Overview with asset attributes and integrated profiling stats
Asset attributes
collected from the
source system
Asset attributes
enriched by users to
add business context
Column profiling stats
including
Null/Unique/Duplicate
percentages, Inferred
data types and data
domains.
Details stats include
value and pattern
distributions
Add data asset
To Project from
any exploration
views
18. Business Glossary Integration
View Business
Glossary Assets
like Terms,
Policies and
Categories in the
Catalog
View and
navigate
to related
technical
and
business
assets in
the
catalog
19. Data Lineage
Interactively trace data origin through summarized lineage views for analysts
Use Lineage and Impact Sliders to drill down to
desired lineage levels on either side of the seed
object.
20. Relationship View
Shows ecosystem of the asset in the enterprise based on association to other assets
Get a 360 Degree View
of data asset using the
relationship view.
Includes related tables,
views, domains and
reports, users etc.
Ability to Zoom, find specific assets
in the view and filter by asset types
Expand relationship
circles to get more
details on relationship
types and objects.
21. Data Preparation continued…
Excel-based data preparation on Sample data
New formula
definition with
type-ahead
Large number of
functions
available for all
types of data
string, numeric,
date, statistical,
Math etc.
Advanced
functionality
such as Join,
Merge,
Aggregate,
Filter, Sort etc.
New values are
calculated and
shown right
away
22. Data Preparation continued…
Excel-based data preparation on Sample data
Column
level
summary
Column value
distributions
Column level
Suggestions
Data
preparation
steps
captured as
“Recipe”
23. Data Publication
Execution of data preparation steps on actual data using Infa mapping
Publish the output of
data preparation steps
back to the lake
Recipe steps are
translated into
Informatica mapping
Informatica mapping is
handed over to BDM
platform for execution on
actual data sources
BDM platform uses either
Map/Reduce or Blaze or
Spark to execute the
mapping
Mapping is available to
the ETL specialists to
open in Informatica
Developer tool to
operationalize
Users credentials are
used to access the
underlying database.
24. Organizations need ONE solution that helps them…
Easily Find &
Catalog Data &
Discover
Relationships
Rapidly Prepare &
Share Data Exactly
When it is Needed
Get instant Access to
Trusted & Secure
Data for Advanced
Analytics
Ingest, Cleanse, Integrate & protect data at scale
If your customer thinks of Informatica as an ETL company, this is a chance to change their perception. We are the #1 leader in 6 important data categories:
First, cloud data management – we have a full portfolio of data management services for all the major cloud ecosystems – either cloud only or hybrid
Data integration – our bread and butter – we have been the best at it for a long time and we continue to set the bar
Big Data Management – we are the leader in data management for Big Data platforms. We work closely with all the major Hadoop, NoSQL ecosystems and with all the latest Big Data technologies like Spark
Master Data Management – we are the leader in MDM for customer data and any other data that is important to their business. Our secret sauce is our matching engine, ability to discover relationships, and scalability. We can do this on any data platform, either on-premise or in the Cloud.
Data Quality – we are setting the bar in DQ. Whether it is for stand-alone initiatives like data governance or for embedding data quality into their business processes
Data Security – we are pioneering a new approach in security. Security remains an unsolved problem, and we can address it at the data level
Most organization are building out some version of a data lake or enterprise data hub concept.
Really they are looking to get all their data into one place for next generation of analytics, ability for all people to have access to information
They are usually divided up into multiple types of zones.
To serve these market trends best Informatica developed a Big Data solution that addresses each of the trends.
The EIC module helps people understand the data they are looking at providing context
The IDL module allows business to be more self service by providing self service data preparation capabilities, yet also helps IT operationalize the data preparation steps at scale in a managed and governed way.
Secure@Source gives insight in potential risks around privacy sensitive data, by providing insight in where this data is located, how it is proliferated across the Data Lake (and surrounding applications) and what the associated risks are.
Big Data Management helps customers ingest, parse, cleanse, integrate and deliver big data at scale.
Intelligent Streaming finally allows processes to use realtime and streaming data sources.
All this fucntionality is built as part of the Intelligent Data Platform where we try to use as much open source tools as possible leveraging the power of the ecosystem.
We use Hbase to store different types of metadata, we use Titan as a graph database to store the relationship information between data assets. We use Spark (incl Spark Streaming) and Blaze to process data at Scale, we use Kafka as a high speed data transfer mechanism and finally we use Solr to index metadata so it can be searched using a google like search interface.
The Enterpsie Information Catalog (EIC) application allows business users to quicly find all information around the collection of data assets in their data lake.
Since EIC can leverage the metadata provided by Cloudera Navigator we can even show Hive/Impala scripts and Pig scripts that are being used to process data.
Intelligent data lake provides capabilities to enable business users to do data preparation
Secure@Source gives insight in sensitive data risks.
Secure@Source gives insight in sensitive data risks.
Dynamic Mappings
Build a template once – automate mapping execution for 1000’s of sources with different schemas automatically
Mapping self adjusts dynamically to external schema changes and column characteristics
Ability to process flat files with changing order (a,b,c or c,a,b) and number of columns dynamically
Re-use PowerCenter and SQL logic
Many customers have existing investments done in traditional powercenter and/or SQL scripts. To allow re-use of these components Informatica provides capabilities to migrate existing PowerCenter logic to run in Hadoop and to convert existing SQL code to Big Data mapping logic that can be executed at scale.
Smart Optimizer
In-built mapping optimizer that automatically tunes and re-arranges the mapping for high performance
Early selection, Early projection, Mapping pruning, Semi-join, Join re-ordering, etc
Automatic partitioning support based on statistics and other heuristics
Advanced full pushdown optimization support including data ship join
Intelligent streaming aims to bring the following capabilities into the Informatica Platform:
Real-time data ingestion from streaming data sources
Rule evaluation and event triggering on a real-time data stream
Real-time Data Integration: complex transforms, lookups, joins etc. in real time
Data Stewards are responsible for strategically managing data assets in the data lake and the enterprise ensuring high levels of data quality, integrity, availability, trustworthiness, and data security while emphasizing the business value of data. By building a catalog, classifying metadata and data definitions, maintaining technical and business rules and monitoring data quality, data stewards ensure data in the lake is consistent for use in the discovery zone and enterprise zone. As the inventory of technical and business metadata is established and data sets available, data architects must design robust scalable data lake architecture to meets the business goals of the marketing data lake.
Before we dive into the demo, lets look at some terminology, I will be using these terms quite a bit in the demo:
Data Lake
A data lake is a centralized repository of large volumes of structured and unstructured data. A data lake can contain different types of data, including raw data, refined data, master data, transactional data, log file data, and machine data. In Intelligent Data Lake, the data lake is a Hadoop cluster.
Data Asset A data asset is data that you work with as a unit. Data assets can include items such as a flat file, table, or view. A data asset can include data stored in or outside the data lake.
Project A project is a container stores data assets and worksheets.
Data Preparation The process of combining, cleansing, transforming, and structuring data from one or more data assets so that it is ready for analysis.
Recipe A recipe includes the list of input sources and the steps taken to prepare data in a worksheet.
Data Publication data publication is the process of making prepared data available in the data lake. When you publish prepared data, Intelligent Data Lake writes the transformed input source to a Hive table in the data lake. Other analysts can add the published data to their projects and create new data assets. Or analysts can use a third-party business intelligence or advanced analytic tool to run reports to further analyze the published data.