In this webinar, we’ll show you how Cloudera SDX reduces the complexity in your data management environment and lets you deliver diverse analytics with consistent security, governance, and lifecycle management against a shared data catalog.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
Many had dubbed 2020 as the decade of data. This is indeed an era of data zeitgeist.
From code-centric software development 1.0, we are entering software development 2.0, a data-centric and data-driven approach, where data plays a central theme in our everyday lives.
As the volume and variety of data garnered from myriad data sources continue to grow at an astronomical scale and as cloud computing offers cheap computing and data storage resources at scale, the data platforms have to match in their abilities to process, analyze, and visualize at scale and speed and with ease — this involves data paradigm shifts in processing and storing and in providing programming frameworks to developers to access and work with these data platforms.
In this talk, we will survey some emerging technologies that address the challenges of data at scale, how these tools help data scientists and machine learning developers with their data tasks, why they scale, and how they facilitate the future data scientists to start quickly.
In particular, we will examine in detail two open-source tools MLflow (for machine learning life cycle development) and Delta Lake (for reliable storage for structured and unstructured data).
Other emerging tools such as Koalas help data scientists to do exploratory data analysis at scale in a language and framework they are familiar with as well as emerging data + AI trends in 2021.
You will understand the challenges of machine learning model development at scale, why you need reliable and scalable storage, and what other open source tools are at your disposal to do data science and machine learning at scale.
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
If there were a buzzword of the hour, it would certainly be "data mesh"! This new architectural paradigm unlocks analytic data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios.
As such, the data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a data mesh infrastructure must be real-time, decoupled, reliable, and scalable.
This presentation explores how Apache Kafka, as an open and scalable decentralized real-time platform, can be the basis of a data mesh infrastructure and - complemented by many other data platforms like a data warehouse, data lake, and lakehouse - solve real business problems.
There is no silver bullet or single technology/product/cloud service for implementing a data mesh. The key outcome of a data mesh architecture is the ability to build data products; with the right tool for the job.
A good data mesh combines data streaming technology like Apache Kafka or Confluent Cloud with cloud-native data warehouse and data lake architectures from Snowflake, Databricks, Google BigQuery, et al.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
Many had dubbed 2020 as the decade of data. This is indeed an era of data zeitgeist.
From code-centric software development 1.0, we are entering software development 2.0, a data-centric and data-driven approach, where data plays a central theme in our everyday lives.
As the volume and variety of data garnered from myriad data sources continue to grow at an astronomical scale and as cloud computing offers cheap computing and data storage resources at scale, the data platforms have to match in their abilities to process, analyze, and visualize at scale and speed and with ease — this involves data paradigm shifts in processing and storing and in providing programming frameworks to developers to access and work with these data platforms.
In this talk, we will survey some emerging technologies that address the challenges of data at scale, how these tools help data scientists and machine learning developers with their data tasks, why they scale, and how they facilitate the future data scientists to start quickly.
In particular, we will examine in detail two open-source tools MLflow (for machine learning life cycle development) and Delta Lake (for reliable storage for structured and unstructured data).
Other emerging tools such as Koalas help data scientists to do exploratory data analysis at scale in a language and framework they are familiar with as well as emerging data + AI trends in 2021.
You will understand the challenges of machine learning model development at scale, why you need reliable and scalable storage, and what other open source tools are at your disposal to do data science and machine learning at scale.
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
If there were a buzzword of the hour, it would certainly be "data mesh"! This new architectural paradigm unlocks analytic data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios.
As such, the data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a data mesh infrastructure must be real-time, decoupled, reliable, and scalable.
This presentation explores how Apache Kafka, as an open and scalable decentralized real-time platform, can be the basis of a data mesh infrastructure and - complemented by many other data platforms like a data warehouse, data lake, and lakehouse - solve real business problems.
There is no silver bullet or single technology/product/cloud service for implementing a data mesh. The key outcome of a data mesh architecture is the ability to build data products; with the right tool for the job.
A good data mesh combines data streaming technology like Apache Kafka or Confluent Cloud with cloud-native data warehouse and data lake architectures from Snowflake, Databricks, Google BigQuery, et al.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
Building Data Quality pipelines with Apache Spark and Delta LakeDatabricks
Technical Leads and Databricks Champions Darren Fuller & Sandy May will give a fast paced view of how they have productionised Data Quality Pipelines across multiple enterprise customers. Their vision to empower business decisions on data remediation actions and self healing of Data Pipelines led them to build a library of Data Quality rule templates and accompanying reporting Data Model and PowerBI reports.
With the drive for more and more intelligence driven from the Lake and less from the Warehouse, also known as the Lakehouse pattern, Data Quality at the Lake layer becomes pivotal. Tools like Delta Lake become building blocks for Data Quality with Schema protection and simple column checking, however, for larger customers they often do not go far enough. Notebooks will be shown in quick fire demos how Spark can be leverage at point of Staging or Curation to apply rules over data.
Expect to see simple rules such as Net sales = Gross sales + Tax, or values existing with in a list. As well as complex rules such as validation of statistical distributions and complex pattern matching. Ending with a quick view into future work in the realm of Data Compliance for PII data with generations of rules using regex patterns and Machine Learning rules based on transfer learning.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
As organizations pursue Big Data initiatives to capture new opportunities for data-driven insights, data governance has become table stakes both from the perspective of external regulatory compliance as well as business value extraction internally within an enterprise. This session will introduce Apache Atlas, a project that was incubated by Hortonworks along with a group of industry leaders across several verticals including financial services, healthcare, pharma, oil and gas, retail and insurance to help address data governance and metadata needs with an open extensible platform governed under the aegis of Apache Software Foundation. Apache Atlas empowers organizations to harvest metadata across the data ecosystem, govern and curate data lakes by applying consistent data classification with a centralized metadata catalog.
In this talk, we will present the underpinnings of the architecture of Apache Atlas and conclude with a tour of governance capabilities within Apache Atlas as we showcase various features for open metadata modeling, data classification, visualizing cross-component lineage and impact. We will also demo how Apache Atlas delivers a complete view of data movement across several analytic engines such as Apache Hive, Apache Storm, Apache Kafka and capabilities to effectively classify, discover datasets.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Matt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 12:30 - 13:00
Organisation: Cloudera
About: The growth of public cloud is reinforcing the need to think more carefully about taking a consistent approach to data governance as technology teams build out a flexible and agile infrastructure to meet the demands of the business.
Join this session to learn more about Cloudera's recommended approach for enterprise-grade security and governance and how to ensure a consistent framework across private, public and on-premises environments.
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
Enterprise data architectures usually contain many systems—data lakes, message queues, and data warehouses—that data must pass through before it can be analyzed. Each transfer step between systems adds a delay and a potential source of errors. What if we could remove all these steps? In recent years, cloud storage and new open source systems have enabled a radically new architecture: the lakehouse, an ACID transactional layer over cloud storage that can provide streaming, management features, indexing, and high-performance access similar to a data warehouse. Thousands of organizations including the largest Internet companies are now using lakehouses to replace separate data lake, warehouse and streaming systems and deliver high-quality data faster internally. I’ll discuss the key trends and recent advances in this area based on Delta Lake, the most widely used open source lakehouse platform, which was developed at Databricks.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
Building Data Quality pipelines with Apache Spark and Delta LakeDatabricks
Technical Leads and Databricks Champions Darren Fuller & Sandy May will give a fast paced view of how they have productionised Data Quality Pipelines across multiple enterprise customers. Their vision to empower business decisions on data remediation actions and self healing of Data Pipelines led them to build a library of Data Quality rule templates and accompanying reporting Data Model and PowerBI reports.
With the drive for more and more intelligence driven from the Lake and less from the Warehouse, also known as the Lakehouse pattern, Data Quality at the Lake layer becomes pivotal. Tools like Delta Lake become building blocks for Data Quality with Schema protection and simple column checking, however, for larger customers they often do not go far enough. Notebooks will be shown in quick fire demos how Spark can be leverage at point of Staging or Curation to apply rules over data.
Expect to see simple rules such as Net sales = Gross sales + Tax, or values existing with in a list. As well as complex rules such as validation of statistical distributions and complex pattern matching. Ending with a quick view into future work in the realm of Data Compliance for PII data with generations of rules using regex patterns and Machine Learning rules based on transfer learning.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
As organizations pursue Big Data initiatives to capture new opportunities for data-driven insights, data governance has become table stakes both from the perspective of external regulatory compliance as well as business value extraction internally within an enterprise. This session will introduce Apache Atlas, a project that was incubated by Hortonworks along with a group of industry leaders across several verticals including financial services, healthcare, pharma, oil and gas, retail and insurance to help address data governance and metadata needs with an open extensible platform governed under the aegis of Apache Software Foundation. Apache Atlas empowers organizations to harvest metadata across the data ecosystem, govern and curate data lakes by applying consistent data classification with a centralized metadata catalog.
In this talk, we will present the underpinnings of the architecture of Apache Atlas and conclude with a tour of governance capabilities within Apache Atlas as we showcase various features for open metadata modeling, data classification, visualizing cross-component lineage and impact. We will also demo how Apache Atlas delivers a complete view of data movement across several analytic engines such as Apache Hive, Apache Storm, Apache Kafka and capabilities to effectively classify, discover datasets.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Matt Stubbs
Date: 14th November 2018
Location: Data-Driven Ldn Theatre
Time: 12:30 - 13:00
Organisation: Cloudera
About: The growth of public cloud is reinforcing the need to think more carefully about taking a consistent approach to data governance as technology teams build out a flexible and agile infrastructure to meet the demands of the business.
Join this session to learn more about Cloudera's recommended approach for enterprise-grade security and governance and how to ensure a consistent framework across private, public and on-premises environments.
It’s becoming clear that enterprises need more than one cloud. Hybrid enables enterprises to optimize how their business works – public cloud for elasticity and scale, multi-cloud for redundancy and choice, and on-premises for performance and privacy. Cloudera delivers a hybrid cloud solution that works where enterprises work, with the agility, security and governance enterprise IT needs, and the self-service analytics business people and enterprise data professionals demand. In this session, we will talk about how Cloudera helps deliver hybrid solutions for enterprises and will run a hands-on Cloudera PaaS demo to exhibit:
- Altus Environment Setup
- Configure Altus SDX
- Spin-up transient clusters with Altus
- Execute workload on Altus Data Engineering clusters
- Run interactive queries on object store with Altus Data Warehouse
- Job Analytics with Workload Experience Manager (WXM)
Speaker: Junaid Rao, Senior Cloud Sales Engineer, Cloudera
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
Machine learning and analytics applications are exploding in the enterprise; driving use cases for preventative maintenance, delivering new desirable product offers to customers at the right time, and combating insider threats to your business.
But each of these high-value use cases rely on a variety of data analysis capabilities working in concert to combine data from different sources into a single coherent picture. Cloudera SDX delivers a “shared data experience” that makes applications easier to develop, less expensive to deploy and more consistently secure.
3 things to learn:
* Why multi-function applications are difficult to build and secure
* How shared catalog, governance, management, and security applied consistently everywhere can deliver a “shared data experience”
* How enterprise customers are building new, high-value applications with SDX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
Maschinelles Lernen und Analyseanwendungen explodieren im Unternehmen und ermöglichen Anwendungsfällen in Bereichen wie vorbeugende Wartung, Bereitstellung neuer, wünschenswerter Produktangebote für Kunden zum richtigen Zeitpunkt und Bekämpfung von Insider-Bedrohungen für Ihr Unternehmen.
A deep dive into running data analytic workloads in the cloudCloudera, Inc.
Aishwarya Venkataraman, Jason Wang, Mala Ramakrishnan, Stefan Salandy, and Vinithra Varadharajan lead a deep dive into running data analytic workloads in a managed service capacity in the public cloud and highlight cloud infrastructure best practices.
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera, Inc.
Neueste Studien zeigen, dass Data Scientisten und Analysten bis zu 80% ihrer Zeit dafür nutzen, Daten zu reinigen und vorzubereiten.
Eine ohnehin schon zeitaufwändige Aufgabe kann in der Cloud noch weiter erschwert werden, da das Cluster Management und Operations die Komplexität noch erhöhen.
Nutzer wünschen sich daher, diese komplexen Workflows zu vereinheitlichen und zu vereinfachen.
Um Big Data und Machine Learning Initiativen voranzutreiben, benötigen Unternehmen eine skalierbare und überall verfügbare Plattform. Diese muss Self-Service ermöglichen und Datensilos eliminieren.
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on AWS. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
This presentation provides an overview of Cloudera and how a modern platform for Machine Learning and Analytics better enables a data-driven enterprise.
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
This webinar will help you maximize the full potential of the cloud. Understand how to leverage cloud environments for different analytic workloads to empower business analysts and keep IT happy. An intricate, beautiful balance. The learn best practices in design, performance tuning, workload considerations, and hybrid or multi-cloud strategies.
This deck covers key considerations and provides advice for enterprises looking to run production-scale Cloudera on AWS. We touch on everything from security to governance to selecting the right instance type for your Hadoop workload (Spark, Impala, Search, etc).
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
In this presentation Microsoft will join Cloudera to introduce a new Platform-as-a-Service (PaaS) offering that helps data engineers use on-demand cloud infrastructure to speed the creation and operation of data pipelines that power sophisticated, data-driven applications - without onerous administration.
Cloudera GoDataFest Deploying Cloudera in the CloudGoDataDriven
Cloud can offer flexibility and agility to your clusters. Learn more about how to deploy Cloudera in the cloud, the best practices for long-running clusters and transient clusters. And see how easy it is to spin up both kind of clusters in the cloud with Altus and Director.
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
Big data platforms are being asked to support an ever increasing range of workloads and compute environments, including large-scale machine learning and public and private clouds. In this talk, we will discuss some emerging capabilities around cloud-native machine learning and data engineering, including running machine learning and Spark workloads directly on Kubernetes, and share our vision of the road ahead for ML and AI in the cloud.
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on Azure. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
Cloudera Altus makes it easier for data engineers, ETL developers, and anyone who regularly works with raw data to process that data in the cloud efficiently and cost effectively. In this webinar we introduce our new platform-as-a-service offering and explore challenges associated with data processing in the cloud today, how Altus abstracts cluster overhead to deliver easy, efficient data processing, and unique features and benefits of Cloudera Altus.
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Cloudera, Inc.
Le cloud public est une proposition attractive pour les entreprises à la recherche d’agilité dans leurs projets big data, qu’il s’agisse de traiter des données en masse ou d’y exécuter des analyses complexes pour une meilleure prise de décision.
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
How can companies integrate data science into their businesses more effectively? Watch this recorded webinar and demonstration to hear more about operationalizing data science with Cloudera Data Science Workbench on Cazena’s fully-managed cloud platform.
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
We hope this session was valuable in teaching you more about Cloudera Enterprise on AWS, and how fast and easy it is to deploy a modern data management platform—in your cloud and on your terms.
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
This annual program recognizes organizations who are moving swiftly towards the future and building innovative solutions by making what was impossible yesterday, possible today.
The winning organizations' implementations demonstrate outstanding achievements in fulfilling their mission, technical advancement, and overall impact.
The 2021 Data Impact Awards recognize organizations' achievements with the Cloudera Data Platform in seven categories:
Data Lifecycle Connection
Data for Enterprise AI
Cloud Innovation
Security & Governance Leadership
People First
Data for Good
Industry Transformation
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
Cloudera is proud to present the 2020 Data Impact Awards Finalists. This annual program recognizes organizations running the Cloudera platform for the applications they've built and the impact their data projects have on their organizations, their industries, and the world. Nominations were evaluated by a panel of independent thought-leaders and expert industry analysts, who then selected the finalists and winners. Winners exemplify the most-cutting edge data projects and represent innovation and leadership in their respective industries.
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
Cloudera Fast Forward Labs’ latest research report and prototype explore learning with limited labeled data. This capability relaxes the stringent labeled data requirement in supervised machine learning and opens up new product possibilities. It is industry invariant, addresses the labeling pain point and enables applications to be built faster and more efficiently.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
In this session, we will cover how to move beyond structured, curated reports based on known questions on known data, to an ad-hoc exploration of all data to optimize business processes and into the unknown questions on unknown data, where machine learning and statistically motivated predictive analytics are shaping business strategy.
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
Watch this webinar to understand how Hortonworks DataFlow (HDF) has evolved into the new Cloudera DataFlow (CDF). Learn about key capabilities that CDF delivers such as -
-Powerful data ingestion powered by Apache NiFi
-Edge data collection by Apache MiNiFi
-IoT-scale streaming data processing with Apache Kafka
-Enterprise services to offer unified security and governance from edge-to-enterprise
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
Cloudera’s Data Science Workbench (CDSW) is available for Hortonworks Data Platform (HDP) clusters for secure, collaborative data science at scale. During this webinar, we provide an introductory tour of CDSW and a demonstration of a machine learning workflow using CDSW on HDP.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
Join Cloudera as we outline how we use Cloudera technology to strengthen sales engagement, minimize marketing waste, and empower line of business leaders to drive successful outcomes.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
Join us to learn about the challenges of legacy data warehousing, the goals of modern data warehousing, and the design patterns and frameworks that help to accelerate modernization efforts.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
Cloudera SDX is by no means no restricted to just the platform; it extends well beyond. In this webinar, we show you how Bardess Group’s Zero2Hero solution leverages the shared data experience to coordinate Cloudera, Trifacta, and Qlik to deliver complete customer insight.
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
Join Cloudera Fast Forward Labs Research Engineer, Mike Lee Williams, to hear about their latest research report and prototype on Federated Learning. Learn more about what it is, when it’s applicable, how it works, and the current landscape of tools and libraries.
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
451 Research Analyst Sheryl Kingstone, and Cloudera’s Steve Totman recently discussed how a growing number of organizations are replacing legacy Customer 360 systems with Customer Insights Platforms.
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
In this webinar, you will learn how Cloudera and BAH riskCanvas can help you build a modern AML platform that reduces false positive rates, investigation costs, technology sprawl, and regulatory risk.
Workload Experience Manager (XM) gives you the visibility necessary to efficiently migrate, analyze, optimize, and scale workloads running in a modern data warehouse. In this recorded webinar we discuss common challenges running at scale with modern data warehouse, benefits of end-to-end visibility into workload lifecycles, overview of Workload XM and live demo, real-life customer before/after scenarios, and what's next for Workload XM.
Get started with Cloudera's cyber solutionCloudera, Inc.
Cloudera empowers cybersecurity innovators to proactively secure the enterprise by accelerating threat detection, investigation, and response through machine learning and complete enterprise visibility. Cloudera’s cybersecurity solution, based on Apache Spot, enables anomaly detection, behavior analytics, and comprehensive access across all enterprise data using an open, scalable platform. But what’s the easiest way to get started?
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
We'll outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, along with its extended ecosystem of libraries and deep learning frameworks using Cloudera's Data Science Workbench.
The General Data Protection Regulation (GDPR) went into effect on May 25, 2018, and this has immediate implications for handling data in your big data, machine learning, and analytics environments. Traditional architectural approaches will need to be adjusted to be compliant with several of the provisions. The good news is that Cloudera can help you!
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
20. Of course! We have our internal EDH cluster. That
would be easy!
With increased focus on … business
insights.. dashboard … FAST...
Charles,
SVP, Emerging Businesses
Mulyadi,
Data Scientist
21. Pipelines! Workloads! Queries! More
pipelines. More workloads! More queries!
Even more….
Mulyadi,
Data Scientist
Alan,
Internal EDH Data Platform
Manager
Adding more workloads to Internal EDH clusters is
risky and adds uncertainty to existing SLA-sensitive
workloads.
Multi-disciplinary analytics: vehicle to go from insight to action.
It is the foundation for all your analytics innovation
Determines how successfully you can meet both your business & operational goals
Will enable you to ask bigger questions and solve complex analytics problems
Everyone is doing it – experience with our customers
We’ve previously done webinars on just that topic, so please reach out if you’d like to receive a link to a replay for those
Requires a platform that:
Support multi-function analytics
Minimize time to add workloads
Support elastic workloads
Enable self-service
Provide a scalable model for sharing data
Reduce cost
Increase tenant isolation
Secure the environment
We’ve all been doing analytics for a very long time, for a variety of reasons and in a range of departments. So we’ve accumulated some baggage
We’ve gathered typical legacy systems. Each own application does its own thing, has it’s own data storage and also its own context.
Many data silos, each requiring its own proprietary tools and infrastructure. Each application or workload has dedicated compute, data storage and context
Different vendors, products, and services. Throughout a project, they’re all needed and in different ratios. You start off with
ETL/DE
DW/M
Analytics in SQL
NoSQL/RT
DS
Each in it’s own siloed application. And you may have multiple of each kind, of each workload.
A fragmented approach is difficult, expensive, and risky. Managing and guaranteeing security/compliance etc across the board is hard. Nothing’s shared. Context is reestablished each and every time. New application or workload, new context. Whole disciplines grown around trying to keep this all in sync
A struggle or, for the Disney fans, a tale as old as time between business and IT., More beast than beauty though
Different problems for different users.
IT
Each application or workload is single use, data sources are inflexible
See lots of redundancy of data, inefficient operating environment
Users
Can’t find data they need. Waiting on IT
Finding out of date data
Doing prep, not finding insights
Head of data and analytics
Administrative, not innovating – choosing databases rather than operationalizing
Users resort to shadow IT to work around their challenges
Organisation vs business unit
So then use a big data platform on premises. A single physical cluster. Everything in one place so the shared data experience to multiple workloads and tenants is given
Strong
multi-function support
Shared data experience
Information security model
Ok-ish, moderate
Cost management
Tenant isolation
Workload elasticity
And it’s ok to be on premises, even though
Weak
Self service
Speed of deployment
To address those two challenges, organisations are branching out and looking at alternatives to meet the business needs in those areas. Time to insight and action needs to be forever shorter: self service. New capacity is needed in the next 10m and the need is gone by tomorrow. That sounds like cloud, right?
Well, certainly right on those fronts. Cloud deployments are strong where on-premises is weak:
Strong
Tenant isolation
Workload elasticity
Self-service
Speed of deployment
But if you haven’t carefully looked at your cloud strategy, you can find yourself re-introducing context silos.
Weak
Shared data experience
Information security model
It’s compounded by the fact we’re now dealing with a mix of permanent and transient workloads. Back to square one in terms of challenges. Biggest ones around: security and governance
Irrespective of what audience, the impact on business is detrimental
The crux is the data context. To have a workload, you need data, compute power AND context
Compute and data are becoming further separated
Compute is stateless: cloud-based or on-prem, either transient or long-running
Data is stateful: cloud-based or on-prem in HDFS, Kudu, S3, ADLS, Isilon, etc.
What about data context?
Schema Definitions
Permissions
Encryption Keys
Governance
Data context should be stateful, but currently is stateless
This creates synchronization and usability challenges for admins and end users alike
So we introduced Cloudera SDX - or shared data experience – the foundations of Cloudera Enterprise.
SDX makes it possible for companies to run dozens - hundreds - of analytic applications against a common pool of data. One logical cluster provides a shared data experience to multiple workloads and tenants
SDX applies a centralized, consistent framework for catalog, security, governance, management, data ingest and more.
It makes it faster, easier, and safer for organizations, teams, people to develop and deploy high-value, multi-function use cases like customer next best offer, clinical prediction, and risk modeling.
SDX cuts through silos to unify data, analytics, management, security, and governance, and empowers self-service
It combines the strengths of on-premises and cloud only deployments:
* multi-function support
* shared data experience
* information security model
* cost management
* tenant isolation
* workload elasticity
* self service
* speed of deployment
SDX is a set of open platform services built for multi-functional or multi-disciplinary analytics that have been optimized for the cloud. This means
a shared catalog that helps to define and preserve the structure and the business context of all your data, regardless of where it happens to reside.
that we offer a unified security model that helps protect sensitive data with a consistent set of controls
that we offer a consistent governance model that enables self-service secure access to all of your relevant data. Not just one type of data, really to all of it, increasing your ability to be compliant, particularly in a regulatory environment.
Next, flexible data ingest and replication. We have a number of core partners that we work with in this arena that help you aggregate a single copy of all of your data, providing you easier debt disaster recovery and that eases migration of data from one place to another.
Finally, easy workload management that increases user productivity and boosts job predictability.
So, SDX is really a core piece of how we at Cloudera separate ourselves from the competition.
Couple of examples how SDX helps. The list is endless
Satosh
Santosh
Santosh
Speak about over time this will all become one thing
Santosh
Prefix with a couple of slides illustrating the challenge without SDX
Impact of consistency on the journey
Enables
Simplifies
Example of large European financial org:
Reduce fraud, improve accuracy from 64 to 98% and 80% faster customer service
Cluster spin up with full context and deployed to three different clouds in <1h
Three admin vs 15
Concept to production in 6 months