- The document discusses an IBM Cloud Day 2021 event focused on well-architected data lakes. It provides an overview of two sessions on data lake architecture and building a cloud native data lake on IBM Cloud.
- It also summarizes the key capabilities organizations need from a data lake, including visualizing data, flexibility/accessibility, governance, and gaining insights. Cloud data lakes can address these needs for various roles.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Operationalizing Big Data Pipelines At ScaleDatabricks
Running a global, world-class business with data-driven decision making requires ingesting and processing diverse sets of data at tremendous scale. How does a company achieve this while ensuring quality and honoring their commitment as responsible stewards of data? This session will detail how Starbucks has embraced big data, building robust, high-quality pipelines for faster insights to drive world-class customer experiences.
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
This ebook deep dives into Apache Spark optimizations that improve performance, reduce costs and deliver unmatched scale
https://www.qubole.com/resources/ebooks/accelerating-time-to-value-of-big-data-of-apache-spark
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
Columbia is a data-driven enterprise, integrating data from all line-of-business-systems to manage its wholesale and retail businesses. This includes integrating real-time and batch data to better manage purchase orders and generate accurate consumer demand forecasts.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Operationalizing Big Data Pipelines At ScaleDatabricks
Running a global, world-class business with data-driven decision making requires ingesting and processing diverse sets of data at tremendous scale. How does a company achieve this while ensuring quality and honoring their commitment as responsible stewards of data? This session will detail how Starbucks has embraced big data, building robust, high-quality pipelines for faster insights to drive world-class customer experiences.
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
This ebook deep dives into Apache Spark optimizations that improve performance, reduce costs and deliver unmatched scale
https://www.qubole.com/resources/ebooks/accelerating-time-to-value-of-big-data-of-apache-spark
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
Columbia is a data-driven enterprise, integrating data from all line-of-business-systems to manage its wholesale and retail businesses. This includes integrating real-time and batch data to better manage purchase orders and generate accurate consumer demand forecasts.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
The driving force behind Apache Spark (Databricks Inc.) and Microsoft have designed a joint service to quickly and easily create Big Data and Advanced Analytics solutions. The combination of the comprehensive Databricks Unified Analytics platform and the powerful capabilities of Microsoft Azure make it easy to analyse data streams or large amounts of data, as well asthe training of AI models. Sascha Dittmann shows in this session how the new Azure service can be set up and used in various real-world scenarios. He also shows, how to connect the various Azure Services to the Azure Databricks service.
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
In this session you will learn how to develop data pipelines in Azure Data Factory and build a Cloud-based analytical solution adopting modern data warehouse approaches with Azure SQL Data Warehouse and implementing incremental ETL orchestration at scale. With the multiple sources and types of data available in an enterprise today Azure Data factory enables full integration of data and enables direct storage in Azure SQL Data Warehouse for powerful and high-performance query workloads which drive a majority of enterprise applications and business intelligence applications.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks.
In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...Databricks
Fivetran makes it easy to automate data ingestion particularly for operational data sources such as Salesforce, Zendesk, and Oracle Eloqua, no matter how source schemas and APIs change. Achieving historical analysis is cumbersome, time-consuming, and costly to build and maintain manually. A common approach is to include snapshots, which only take into account changes at a given time. Plus, the additional storage requirements can become unwieldy to manage. Type 2 Slowly Changing Dimension (SCD) allows you to track any change at any point in time. This session shows how Fivetran History Mode, which uses Type 2 SCD, can be easily configured and then switched on with 1-click and synchronized for a desired time period. This accelerates time to insights, making it easy to both automate data ingestion and historical analysis.
Azure Data Factory is one of the newer data services in Microsoft Azure and is part of the Cortana Analyics Suite, providing data orchestration and movement capabilities.
This session will describe the key components of Azure Data Factory and take a look at how you create data transformation and movement activities using the online tooling. Additionally, the new tooling that shipped with the recently updated Azure SDK 2.8 will be shown in order to provide a quickstart for your cloud ETL projects.
This presentation focuses on the value proposition for Azure Databricks for Data Science. First, the talk includes an overview of the merits of Azure Databricks and Spark. Second, the talk includes demos of data science on Azure Databricks. Finally, the presentation includes some ideas for data science production.
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Event: Passcamp, 07.12.2017
Speaker: Stefan Kirner
Mehr Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Mehr Tech-Artikel: https://www.inovex.de/blog
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
The driving force behind Apache Spark (Databricks Inc.) and Microsoft have designed a joint service to quickly and easily create Big Data and Advanced Analytics solutions. The combination of the comprehensive Databricks Unified Analytics platform and the powerful capabilities of Microsoft Azure make it easy to analyse data streams or large amounts of data, as well asthe training of AI models. Sascha Dittmann shows in this session how the new Azure service can be set up and used in various real-world scenarios. He also shows, how to connect the various Azure Services to the Azure Databricks service.
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
In this session you will learn how to develop data pipelines in Azure Data Factory and build a Cloud-based analytical solution adopting modern data warehouse approaches with Azure SQL Data Warehouse and implementing incremental ETL orchestration at scale. With the multiple sources and types of data available in an enterprise today Azure Data factory enables full integration of data and enables direct storage in Azure SQL Data Warehouse for powerful and high-performance query workloads which drive a majority of enterprise applications and business intelligence applications.
Spark is fast becoming a critical part of Customer Solutions on Azure. Databricks on Microsoft Azure provides a first-class experience for building and running Spark applications. The Microsoft Azure CAT team engaged with many early adopter customers helping them build their solutions on Azure Databricks.
In this session, we begin by reviewing typical workload patterns, integration with other Azure services like Azure Storage, Azure Data Lake, IoT / Event Hubs, SQL DW, PowerBI etc. Most importantly, we will share real-world tips and learnings that you can take and apply in your Data Engineering / Data Science workloads
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...Databricks
Fivetran makes it easy to automate data ingestion particularly for operational data sources such as Salesforce, Zendesk, and Oracle Eloqua, no matter how source schemas and APIs change. Achieving historical analysis is cumbersome, time-consuming, and costly to build and maintain manually. A common approach is to include snapshots, which only take into account changes at a given time. Plus, the additional storage requirements can become unwieldy to manage. Type 2 Slowly Changing Dimension (SCD) allows you to track any change at any point in time. This session shows how Fivetran History Mode, which uses Type 2 SCD, can be easily configured and then switched on with 1-click and synchronized for a desired time period. This accelerates time to insights, making it easy to both automate data ingestion and historical analysis.
Azure Data Factory is one of the newer data services in Microsoft Azure and is part of the Cortana Analyics Suite, providing data orchestration and movement capabilities.
This session will describe the key components of Azure Data Factory and take a look at how you create data transformation and movement activities using the online tooling. Additionally, the new tooling that shipped with the recently updated Azure SDK 2.8 will be shown in order to provide a quickstart for your cloud ETL projects.
This presentation focuses on the value proposition for Azure Databricks for Data Science. First, the talk includes an overview of the merits of Azure Databricks and Spark. Second, the talk includes demos of data science on Azure Databricks. Finally, the presentation includes some ideas for data science production.
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Event: Passcamp, 07.12.2017
Speaker: Stefan Kirner
Mehr Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Mehr Tech-Artikel: https://www.inovex.de/blog
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
You have a data lake — now it’s time to unlock its power. Register for the upcoming webinar “Unlocking the Power of the Data Lake” to learn how.
As Hadoop adoption in the enterprise continues to grow, so does commitment to the data lake strategy. Two-thirds of Database Trends and Applications readers are either implementing data lake projects this year or researching and evaluating solutions. Data security, governance, integration, and analytics have all been identified as critical success factors for data lake deployments.
To educate this growing audience about the enabling technologies and best practices for unlocking the power of the data lake, Database Trends and Applications is hosting a special roundtable webinar.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
Date: 13th November 2018
Location: Self-Service Analytics Theatre
Time: 14:30 - 15:00
Speaker: Zaf Khan
Organisation: Arcadia Data
About: The use of data lakes continue to grow, and a recent survey by Eckerson Group shows that organizations are getting real value from their deployments. However, there’s still a lot of room for improvement when it comes to giving business users access to the wealth of potential insights in the data lake.
While the data management aspect has been fairly well understood over the years, the success of business intelligence (BI) and analytics on data lakes lags behind. In fact, organizations often struggle with data lakes because they are only accessible by highly-skilled data scientists and not by business users. But BI tools have been able to access data warehouses for years, so what gives?
In this talk, we’ll discuss:
• Why traditional BI tools are architected well for data warehouses, but not data lakes.
• Why every organization should have two BI standards: one for data warehouses and one for data lakes.
• Innovative capabilities provided by BI for data lakes
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
A tale of two BI standards: Data warehouses and data lakes by Shant Hovsepian, Co-Founder and CTO, Arcadia Data
Data lakes as part of the logical data warehouse (LDW) have entered the trough of disillusionment. Some failures are due to lack of value from businesses focusing on the big data challenges and not the big analytics opportunity. After all, data is just data until you analyze it. While the data management aspect has been fairly well understood over the years, the success of business intelligence (BI) and analytics on data lakes lags behind. In fact, data lakes often fail because they are only accessible by highly skilled data scientists and not by business users. But BI tools have been able to access data warehouses for years, so what gives? Shant Hovsepian explains why existing BI tools are architected well for data warehouses but not data lakes, the pros and cons of each architecture, and why every organization should have two BI standards: one for data warehouses and one for data lakes.
Today, data lakes are widely used and have become extremely affordable as data volumes have grown. However, they are only meant for storage and by themselves provide no direct value. With up to 80% of data stored in the data lake today, how do you unlock the value of the data lake? The value lies in the compute engine that runs on top of a data lake.
Join us for this webinar where Ahana co-founder and Chief Product Officer Dipti Borkar will discuss how to unlock the value of your data lake with the emerging Open Data Lake analytics architecture.
Dipti will cover:
-Open Data Lake analytics - what it is and what use cases it supports
-Why companies are moving to an open data lake analytics approach
-Why the open source data lake query engine Presto is critical to this approach
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function Data Management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a worry-free experience with the architecture and its components.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
Cloud is a sharing economy that reduces your spending. But does this also apply to data and analytics? Doesn't this require you to provision dedicated data warehouse systems to run analytics SQL queries on terabytes of data? With IBM Cloud, the answer is no. By using serverless analytics via IBM Cloud SQL Query, you can analyze your data directly where it sits, be it in IBM Cloud Object Storage or in your NoSQL databases. Due to the serverless nature of SQL Query, you only pay for your queries depending on the data volume that they process. There are no standing costs. You do not need to provision and wait for a data warehouse. But you can still run SQLs on terabytes of data.
Similar to IBM Cloud Day January 2021 - A well architected data lake (20)
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach
You don't necessarily have to set up a relational database, tables and load data in order to use a surprisingly rich set of SQL capabilities on your data in the cloud. IBM SQL Query lets you analyze terabytes of distributed data of heterogeneous formats with a complete ANSI SQL dialect in a completely serverless usage model, elegantly ETL data between formats and partitioning layouts as needed, and run complex time series transformations, analysis and correlations with advanced built-in timeseries SQL algorithms that are differentiating in the entire industry. It also support a complete PostGIS compliant geospatial SQL function set. Come explore the stunningly advanced world of SQL without a database in IBM Cloud.
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudTorsten Steinbach
Agile user and workload insights are one of the key elements of a cloud-native solution. When done well, this represents a real competitive advantage. In this session, we show you how to run cloud-native clickstream analysis with IBM Cloud. By combining serverless mechanisms like object storage for affordable and scalable persistency with SQL Query for serverless analysis of your clickstream data, you can establish a very cost-effective clickstream analysis pipeline easily and quickly.
IBM THINK 2019 - Self-Service Cloud Data Management with SQL Torsten Steinbach
SQL is a powerful language to express data transformations. But did you know that you can also use IBM Cloud SQL to convert data between various data formats and layouts on disks? In this session, you will see the full power of using SQL Query to move and transform your cloud data in an entirely self-service fashion. You can specify any data format, layout or partitioning with a simple SQL statement. See how you can move and transform terabytes of data in the cloud in a very scalable fashion and still being charged only for the individual SQL movement and transformation jobs without having standing costs.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
IBM Cloud Day January 2021 - A well architected data lake
1. IBM Cloud Day 2021
Well Architected Data Lake
James Bennett, Offering Manager
Torsten Steinbach, Senior Technical Staff Member
2. Two Cloud Data Lake Sessions Today
• The Well Constructed Architecture of a Modern Data Lake
• Introductory Session
• What we provide, how you can consume it
• Light introduction to deeper architecture
• Deep Dive into Cloud Native Data Lakes with IBM Cloud
• Session led by Torsten
• Everything you need to know about building a Data Lake on IBM Cloud
• Includes our Covid-19 Data Lake Implementation
3. 3
Organizations need the ability to:
o Visualize data and build data
driven applications
o Increased Data flexibility and
accessibility
o Provide Data governance to
retain data authenticity
o Gain speed with data insights
o Collect, explore and analyze
data
Cloud Data Lake for the
Enterprise
Data Architects Business and Data Analysts
Data scientists and application developers
4. Cloud Data Lake Evolutionary Context
Enterprise Data
Warehouses
Tightly integrated and
optimized systems
Hadoop
Introduced open data formats &
easy scaling on commodity HW
Cloud-Native: Serverless Analytics-aaS
• Elasticity
• Pay-per-query
• Data in object store
• Disaggregated architecture
• Increasingly real-time first
The 90-ies 2000 Today
5. 5
Need ability to effectively analyze data from from remote locations to
gain insights with cost effective, secure, on demand analytics and long-
term data retention
o Nightly batch export from operational production databases in factory
locations are automatically uploaded to data lake in cloud (central COS
bucket).
o LoB engineers subscribes to data in data lake, which is then ETLed with
SQL query to tenant-specific zones (tenant specific COS buckets).
o Future updates of data lake data in central COS bucket is automatically
ETLed right away to tenant specific COS bucket via cloud functions
events.
o LoB engineers explore, experiment and do data preparation using SQL
query on tenant specific buckets.
o LoB engineer uses Watson Studio to run data science, visualize and
present insights to executives.
Solution
Business Problem
Case Study
6. 6
Need ability to effectively ingest and analyze data from multiple vendors
in various data formats to gain competitive insights
Ø Ingest pricing data from 20+ external vendors and persist in Cloud
Object Store.
Ø Data Engineers prep the data by joining vendor data with on-premise
data warehouse
Ø Data Engineers then process result sets using Analytics Engine (Spark)
and Db2 Warehouse on Cloud.
Ø LoB engineers explore, experiment and do data preparation using SQL
query on tenant specific buckets.
Ø LoB engineer uses Watson Studio (notebooks) to run data science,
visualize and present actionable competitive insights to executives.
Solution
Business Problem
Case Study
7. Replicate on-prem
DB to cloud data lake
for analytics
o Capture database
change feed into
Kafka in Cloud
o Land Kafka data to
object storage
o Prepare replicated
change feed for
analytics
o Query for insights
o Present & visualize
insights
Collect, historize &
analyze IoT data
o Land IoT message
data through Even
Streams (Kafka)
o Prepare, cleanse,
extract and enrich
IoT data
o Query for insights
o Present & visualize
insights
Move existing
Hadoop Workload to
Cloud
o Replace HDFS with
cloud-native
storage: object
storage
o Run Hadoop
processing in fully
managed Hadoop
service: analytic
engine
o Interactive analytics
through Watson
Studio
AIOps, gain operational
& business insights
from solution logs
o Collect full solution
telemetry (logs)
o Prepare, cleanse,
extract and enrich
data from logs
o Query for insights
o Present & visualize
insights
7
Use Cases
SQL in Place :
Reduce cost and
decouple workload
from DWHs
o Use data lake in as
landing and
preparation storage
before data gets
ingested to DWH
o Archive data from
DWH to data lake
from affordable
SQL-enabled
archive
o Automate ETL and
enable SQL-
federation across
data lake and DWH
8. Cloud Pak for Data as a Service
Built On
IBM Cloud
Uses
IBM Cloud Data Lake
COS
Storage Analytics
SQL Query
Event Streams
Streaming Transformation
Spark Cloud Databases
Databases
9. Scalability
Start small and grow
large without
overprovisioning for
anticipated scale.
Efficiency and Speed
Get applications to
market quickly,
without worrying
about underlying
infrastructure costs,
maintenance, and
provider security
Flexibility
Pick and choose
services to fit their
needs, customize
applications and
expand across geos
seamlessly
Security
Common security
integrations with
Identity and Access
Management,
customer managed
encryption key, and
common compliance
roadmap
9
IBM Cloud enables a secure,
fully integrated set of Cloud
Data Services
12. Data Science
Tooling
Streaming
Analytics
Analytical
Dashboards
AI
Applications
Data Prep
Tools
Object Stores
Data Lake
Databases
Unstructured &
Streaming Data
Intelligent
data catalog
Assess Risk
Discover
data
Self-serve find &
‘deploy’ data Data Privacy
enforced
Business meaning
Data
Consumers
Hybrid
Data
Sources
Integrated
Data
Governance
• Extract greater value from your data assets through
better data organization and intelligent data
discovery
• Enable AI to help you derive better insights from
your organized data
• Improve data risk strategies by assessing risks
across your data estates
• Increase user productivity through safe self-service
data access
• Unified end-user experience driven by seamlessly
integrated services across the platform
12
Enable safe self-service access to data across users with multiple skill levels enabling them
to use the power of AI securely at speed
Key Business Outcome: DataOps
13. Cloud Pak for Data as a Service
Built On
IBM Cloud
Uses
IBM Cloud Data Lake
COS
Storage Analytics
SQL Query
Event Streams
Streaming Transformation
Spark Cloud Databases
Databases
14. Industry-leading
optimizations for SQL-
native location &
timeseries data and
indexing of object storage
data
High velocity due to self-
service data management,
preparation & analytics
with extreme low barrier
of entry thanks to
serverless model
Most secure data lake
option in cloud due unique
BYO and KYOK key
services in IBM Cloud.
Enables Cloud Economics,
Resiliency and Scale for
Big Data
14
Why IBM Cloud Data Lake?
16. Telemetry Data
Explore
ETL
Prep Enrich
Streaming
Optimize Analyze
ü Seamless Elasticity
ü Seamless Scalability
ü Highly Cost Effective
ü Long Term Retention
ü Any data formats
ETL
IBM Cloud Data Lake – Big Picture
DWH
Databases
ü Response Time SLAs
ü Warm High-quality Data only
Cloud Data Lake
Analytics
Optional:
17. IBM Serverless Stack for Analytics
Serverless
Storage
Serverless
Runtimes
Serverless
Analytics
Object
Storage
Cloud
Functions
Query
Only pay for volume of data
that you really store
Only pay for
amount of
data that you
really scan
Only pay for
CPU that
you really
consume
Blog Article
§ Properties of Serverless:
– No management of resources, hosts and
processes
– Auto-scaling and auto-provisioning based
on actual load
– Precise billing based on really consumed
system resources (memory, storage, CPU,
network, I/O)
– High-Availability is always implicit
18. IBM SQL Query – The Central Cloud Data Lake Service
Cloud Data
Data
Transformation
Serverless SQL Query Service
Analytics
Object
Storage RDBMS
+
Developers
Data
Engineers
Data Analysts
ü Supports ad-hoc and
unknown data structures
ü ETL & ELT Support
ü 100% Pay-as-you-go (5$/TB)
ü 100% API enabled
ü Automatic Big Data Scale-
Out with Spark
ü 100% Self service, No Setup
Data
Management
+
Data Scientists
ü Built-In Database Catalog &
Data Skipping
Data Ingestion
+
19. IBM SQL Query Architecture
2. Read data
4. Read
results
Application
3. Write data
Cloud Data Services
1. Submit SQL
SQL
Event Streams
Query
Db2 on Cloud
Geospatial SQL
Data Skipping
Timeseries SQL
Hive Metastore
Video
Cloud Object Storage
• Using IBM Analytic Engine service
(Spark clusters aaS)
• Large farm of Spark clusters auto-
provisioned & auto-managed in background
• Managing a hot pool of Spark applications
(a.k.a. kernels, using Jupyter Kernel Gateway)
• SQL grammar sandbox
• Auto-scaling of each serverless SQL job
inside large Spark clusters using dynamic
resource allocation
• Intrinsically HA (dispatching across Spark
environments in each availability zone)
20. IBM SQL Query – Access Patterns
Create
Query
SQL
Console
Watson
Studio
Notebooks
Cloud Functions
Integrate Explore
Deploy
Python SDK
REST API
JDBC
Object
Store
Console
Event
Streams
Console
21. Meta Data
IBM Cloud Data Lake – Meta Data
Cloud Data
ACID
Spark
Data Skipping Indexes Governance Policies
& Lineage
Schema, Partitioning,
Statistics
Serverless SQL
Object
Storage RDBMS
Hive
Metastore
Kafka Schema
Registry
Xskipper Iceberg
Watson Knowledge
Catalog
Deltalake
22. Event Streams SQL Query
Object
Storage Meta Data
Integrated Hive Metastore + Kafka Schema Registry + ACID (Iceberg)
Real-Time
Queries
IBM Cloud Data Lake – 2021 Architecture
COS
Batch
Queries
Stream Xform
& Joins
Stream data landing
Schema management & enforcement
ETL & Data
Preparation