Building a Data Driven Culture and AI Revolution With Gregory Little | Current 2022
Transforming business or mission through AI/ML doesn't start with technology but with culture…and an audit. At least as much is true for the US Department of Defense (DoD), which presents significant modernization challenges because of its mission scope, expansive global footprint, and massive size - with over 2.8 million people, it is the largest employer in the world. Greg Little discusses how establishing the DoD’s annual audit became a surprising accelerator for the department’s data and analytics journey. It revealed the foundational needs for data management to run a $3 trillion in assets enterprise, and its successful implementation required breaking through deeply entrenched cultural and organizational resistance across DoD.
In this session, Greg will discuss what it will take to guide the evolution of technology and culture in parallel: leadership, technology that enables rapid scale and a complete & reliable data flow, and a data driven culture.
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Architecting Agile Data Applications for ScaleDatabricks
Data analytics and reporting platforms historically have been rigid, monolithic, hard to change and have limited ability to scale up or scale down. I can’t tell you how many times I have heard a business user ask for something as simple as an additional column in a report and IT says it will take 6 months to add that column because it doesn’t exist in the datawarehouse. As a former DBA, I can tell you the countless hours I have spent “tuning” SQL queries to hit pre-established SLAs. This talk will talk about how to architect modern data and analytics platforms in the cloud to support agility and scalability. We will include topics like end to end data pipeline flow, data mesh and data catalogs, live data and streaming, performing advanced analytics, applying agile software development practices like CI/CD and testability to data applications and finally taking advantage of the cloud for infinite scalability both up and down.
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
"Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms.
We have all heard about the principle ""It runs best on Powerpoint"", so I decided to skip slides here and bring a serverless demo instead:
A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML).
TL;DR benefits for the Data Practitioners:
-Recap the OSS foundation of the Lakehouse architecture and understand its appeal
- Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming.
- Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers?
- Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead."
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This presentation will cover Cloud history and Microsoft Azure Data Analytics capabilities. Moreover, it has a real-world example of DW modernization. Finally, we will check the alternative solution on Azure using Snowflake and Matillion ETL.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Data Engineering is the process of collecting, transforming, and loading data into a database or data warehouse for analysis and reporting. It involves designing, building, and maintaining the infrastructure necessary to store, process, and analyze large and complex datasets. This can involve tasks such as data extraction, data cleansing, data transformation, data loading, data management, and data security. The goal of data engineering is to create a reliable and efficient data pipeline that can be used by data scientists, business intelligence teams, and other stakeholders to make informed decisions.
Visit by :- https://www.datacademy.ai/what-is-data-engineering-data-engineering-data-e/
Build Real-Time Applications with Databricks StreamingDatabricks
In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure.
This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.)
• The current locations and status of firefighters, EMT personnel and other relevant fire department employees
• The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments.
In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
During this lunch, we’ll review open-source reverse ETL tools to uncover how to send data back to SaaS systems.
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
#data #dataengineering #datagovernance
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Intuit Data Ecosystem supports unique consumer and small business assets at scale, and handle petabytes of customer data. We have 8M active small business customers and 16M paid workers that uses Intuit Quick Books and Quick Books Payroll Products. Huge customer base and large volumes of data always challenges the data teams in terms of freshness of data, correctness of data etc. This presentation is intended to cover such problems we faced at Intuit along with the data observability model we follow to cure, detect and prevent data Issues. We would like to provide deep insights into the implementations and the impact of some of the great work done by Intuit in this direction.
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
"Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks.
This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen.
Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms.
We have all heard about the principle ""It runs best on Powerpoint"", so I decided to skip slides here and bring a serverless demo instead:
A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML).
TL;DR benefits for the Data Practitioners:
-Recap the OSS foundation of the Lakehouse architecture and understand its appeal
- Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming.
- Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers?
- Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead."
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This presentation will cover Cloud history and Microsoft Azure Data Analytics capabilities. Moreover, it has a real-world example of DW modernization. Finally, we will check the alternative solution on Azure using Snowflake and Matillion ETL.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Data Engineering is the process of collecting, transforming, and loading data into a database or data warehouse for analysis and reporting. It involves designing, building, and maintaining the infrastructure necessary to store, process, and analyze large and complex datasets. This can involve tasks such as data extraction, data cleansing, data transformation, data loading, data management, and data security. The goal of data engineering is to create a reliable and efficient data pipeline that can be used by data scientists, business intelligence teams, and other stakeholders to make informed decisions.
Visit by :- https://www.datacademy.ai/what-is-data-engineering-data-engineering-data-e/
Build Real-Time Applications with Databricks StreamingDatabricks
In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure.
This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.)
• The current locations and status of firefighters, EMT personnel and other relevant fire department employees
• The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments.
In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
During this lunch, we’ll review open-source reverse ETL tools to uncover how to send data back to SaaS systems.
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
#data #dataengineering #datagovernance
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Intuit Data Ecosystem supports unique consumer and small business assets at scale, and handle petabytes of customer data. We have 8M active small business customers and 16M paid workers that uses Intuit Quick Books and Quick Books Payroll Products. Huge customer base and large volumes of data always challenges the data teams in terms of freshness of data, correctness of data etc. This presentation is intended to cover such problems we faced at Intuit along with the data observability model we follow to cure, detect and prevent data Issues. We would like to provide deep insights into the implementations and the impact of some of the great work done by Intuit in this direction.
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.
Building the Artificially Intelligent EnterpriseDatabricks
This session looks at where we are today with data and analytics and what is needed to transition to the Artificially Intelligent Enterprise.
How do you mobilise developers to exploit what data scientists and business analysts have built? How do you align it all with business strategy to maximise business outcomes? How do you combine BI, predictive and prescriptive analytics, automation and reinforcement learning to get maximum value across the enterprise? What is the blueprint for building the artificially intelligent enterprise?
•Data and analytics – Where are we?
•Why is the journey only half-way done?
•2021 and beyond – The new era of AI usage and not just build
•The requirement – event-driven, on-demand and automated analytics
•Operationalising what you build – DataOps, MLOps and RPA
•Mobilising the masses to integrate AI into processes – what needs to be done?
•Business strategy alignment – the guiding light to AI utilisation for high reward
•Agility step change – the shift to no-code integration of AI by citizen developers
•Recording decisions, and analysing business impact
•Reinforcement-learning – transitioning to continuous reward
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
Want to know more about Common Data Model and Service? You need to understant what's the difference between CDS for Apps and Analytics? Feel free to use these slides and send me your feed backs.
Empowering Business & IT Teams: Modern Data Catalog RequirementsPrecisely
As the demand for data-driven insights continues to grow, the importance of data catalogs will only increase. A modern data catalog addresses new use cases requiring more immediate and intelligent data discovery to drive complete and informed business outcomes.
In this demo, you will hear how the Precisely Data Integrity Suite’s Data Catalog is the connective tissue that empowers business and IT teams to discover, understand, and trust their critical data. Requirements to meet those new use cases include:
· Discovery, lineage, and relationships across silos for more informed insights
· Interoperability with data platforms and tech stacks to increase ROI
· Machine learning to drive more significant insights
· Data observability to alert users to data changes and anomalies
· Business-friendly data governance to advance understanding & accountability
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
Databricks' founders caused a seismic shift in data analysis community when they created Apache Spark which has become a cornerstone of Big Data processing pipelines and tools in large and small companies all around the world. Now they've built a revolutionary, comprehensive and easy-to-use platform around Apache Spark and their other inventions, such as MLFlow and Koalas frameworks and most importantly the Data Lakehouse: a concept of fusing data warehouse and data lake architectures into a single versatile and fast platform. Technical foundation for Databricks Data Lakehouse is Delta Lake. More than 7000 organizations today rely on Databricks to enable massive-scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics. Come to the talk and see the demo to find out why.
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
The Census Bureau is the U.S. government's largest statistical agency with a mission to provide current facts and figures about America's people, places and economy. The Bureau operates a large number of surveys to collect this data, the most well known being the decennial population census. Data is being collected in increasing volumes and the analytics solutions must be able to scale to meet the ever increasing needs while maintaining the confidentiality of the data. Past data analytics have occurred in processing silos inhibiting the sharing of information and common reference data is replicated across multiple system. The use of the Hortonworks Data Platform, Hortonworks Data Flow and other open-source technologies is enabling the creation of a cloud-based enterprise data lake and analytics platform. Cloud object stores are used to provide scalable data storage and cloud compute supports permanent and transient clusters. Data governance tools are used to track the data lineage and to provide access controls to sensitive data.
Describes what Enterprise Data Architecture in a Software Development Organization should cover and does that by listing over 200 data architecture related deliverables an Enterprise Data Architect should remember to evangelize.
What Does Artificial Intelligence Have to Do with IT Operations?Precisely
From the early days of IT, organizations have grappled with the challenges of understanding how well their infrastructure is performing in support of the business. They have used a plethora of tools to detect, manage, and resolve problems that are causing disruption of services, but still struggle to achieve a unified, cross-domain understanding of what is happening across their IT infrastructure. Fortunately, over the past few years analytics platforms like Splunk, Elastic, and others have emerged to address requirements around IT Operations Analytics (ITOA). Now today the buzz is around AIOps – Artificial Intelligence Operations. But what is AIOps, and what can it do to help organizations address IT challenges. In this presentation you will get a better understanding of:
What is Artificial Intelligence for IT Operations
What are the required technologies for success at AIOps
What challenges exist for achieving AIOPs
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
What's the origin of Big Data? What are the real life usage scenarios where Hadoop has been successfully adopted? How do you get started within your organizations?
Unlocking New Insights with Information DiscoveryAlithya
Edgewater Ranzal invited to present Unlocking New Insights with Information Discovery at the Oracle Hyperion User Group Minnesota (HUGmn) Tech Day 2015. Presented an introduction to Oracle Endeca Information Discovery (OEID), a powerful database tool for structured and unstructured data.
Similar to Building a Data Driven Culture and AI Revolution With Gregory Little | Current 2022 (20)
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way.
Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks.
Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case.
Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."
"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics.
Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
"Trendyol, Turkey's leading e-commerce company, is committed to positively impacting the lives of millions of customers. Our decision-making processes are entirely driven by data. As a data warehouse team, our primary goal is to provide accurate and up-to-date data, enabling the extraction of valuable business insights.
We utilize the benefits provided by Kafka and Kafka Connect to facilitate the transfer of data from the source to our analytical environment. We recently transitioned our Kafka Connect clusters from on-premise VMs to Kubernetes. This shift was driven by our desire to effectively manage rapid growth(marked by a growing number of producers, consumers, and daily messages), ensuring proper monitoring and consistency. Consistency is crucial, especially in instances where we employ Single Message Transforms to manipulate records like filtering based on their keys or converting a JSON Object into a JSON string.
Monitoring our cluster's health is key and we achieve this through Grafana dashboards and alerts generated through kube-state-metrics. Additionally, Kafka Connect's JMX metrics, coupled with NewRelic, are employed for comprehensive monitoring.
The session will aim to explain our approach to NRT data ingestion, outlining the role of Kafka and Kafka Connect, our transition journey to K8s, and methods employed to monitor the health of our clusters."
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
"Join our lightning talk to delve into the strategies vital for maintaining a resilient Kafka service.
While proactive monitoring is key for issue prevention, failures will still occur. Rapid detection tools will enable you to identify and resolve problems before they impact end-users. This session explores the techniques employed by Kafka cloud providers for this detection, many of which are also applicable if you are managing independent Kafka clusters or applications.
The talk focuses on health-checking, a powerful tool that encompasses an application and its monitoring to validate Kafka environment availability. The session navigates through Kafka health-check methods, sharing best practices, identifying common pitfalls, and highlighting the monitoring of critical performance metrics like throughput and latency for early issue detection.
Attendees will gain valuable insights into the art of health-checking their Kafka environment, equipping them with the tools to identify and address issues before they escalate into critical problems. We invite all Kafka enthusiasts to join us in this talk to foster a deeper understanding of Kafka health-checking and ensure the continued smooth operation of your Kafka environment."
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
"Stream processing systems traditionally gave their users the choice between at least once processing and at most once processing: accepting duplicate data or missing data. But ideally we would provide exactly-once processing, where every event in the input data is represented exactly once in the output.
Kafka provides a transaction API that enables exactly-once when using Kafka as your source and sink. But this API has turned out to not be well suited for use by high level streaming systems, requiring various work arounds to still provide transactional processing.
In this talk, I’ll cover how the transaction API works, and how systems like Arroyo and Flink have used it to build exactly-once support, and how improvements to the transactional API will enable better end-to-end support for consistent stream processing."
"In this talk, we will explore the exciting world of IoT and computer vision by presenting a unique project: Fish Plays Pokemon. Using an ESP Eye camera connected to an ESP32 and other IoT devices, to monitor fish's movements in an aquarium.
This project showcases the power of IoT and computer vision, demonstrating how even a fish can play a popular video game. We will discuss the challenges we faced during development, including real-time processing, IoT device integration, and Kafka message consumption.
By the end of the talk, attendees will have a better understanding of how to combine IoT, computer vision, and the usage of a serverless cloud to create innovative projects. They will also learn how to integrate IoT devices with Kafka to simulate keyboard behavior, opening up endless possibilities for real-time interactions between the physical and digital worlds."
What is tiered storage and what is it good for? After this session you will know how to leverage the tiered storage feature to enable longer retention than the storage attached to brokers allows. You will get acquainted with the different configuration options and know what to expect when you enable the feature, like for example when will the first upload to the remote object storage take place.
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
"Real-time 24/7 monitoring and verification of massive data is challenging – even more so for the world’s second largest manufacturer of memory chips and semiconductors. Tolerance levels are incredibly small, any small defect needs to be identified and dealt with immediately. The goal of semiconductor manufacturing is to improve yield and minimize unnecessary work.
However, even with real-time data collection, the data was not easy to manipulate by users and it took many days to enable stream processing requests – limiting its usefulness and value to the business.
You’ll hear why SK hynix switched to Confluent and how we developed a self-service stream process portal on top of it. Now users have an easy-to-use service to manipulate the data they want.
Results have been impressive, stream processing requests are available the same day – previously taking 5 days! We were also able to drive down costs by 10% as stream processing requests no longer require additional hardware.
What you’ll take away from our talk:
- What were the pain points in the previous environment
- How we transitioned to Confluent without service downtime
- Creating a self-service stream processing portal built on top of Connect and ksqlDB
- Use case of stream process portal"
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
"Discover how default configurations might impact ingestion times, especially when dealing with large files. We'll explore a real-world scenario with a 20,000,000+ line file, assessing metrics and exploring the bottleneck in the default setup. Understand the intricacies of batch size calculations and how to optimize them based on your unique data characteristics.
Walk away with actionable insights as we showcase a practical example, turning a 7-hour ingestion process into a mere 30 minutes for over 30,000,000 records in a Kafka topic. Uncover metrics, configurations, and best practices to elevate the performance of your Kafka Connect CSV source connectors. Don't miss this opportunity to optimize your data pipeline and ensure smooth, efficient data flow."
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
"In order to meet the current and ever-increasing demand for near-zero RPO/RTO systems, a focus on resiliency is critical. While Kafka offers built-in resiliency features, a perfect blend of client and cluster resiliency is necessary in order to achieve a highly resilient Kafka client application.
At Fidelity Investments, Kafka is used for a variety of event streaming needs such as core brokerage trading platforms, log aggregation, communication platforms, and data migrations. In this lightening talk, we will discuss the governance framework that has enabled producers and consumers to achieve their SLAs during unprecedented failure scenarios. We will highlight how we automated resiliency tests through chaos engineering and tightly integrated observability dashboards for Kafka clients to analyze and optimize client configurations. And finally, we will summarize the chaos test suite and the ""test, test and test"" mantra that are helping Fidelity Investments reach its goal of a future with zero down-time."
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
"There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option!
In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds."
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met.
I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines.
Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
"Almost everyone has heard about large language models, and tens of millions of people have tried out OpenAI ChatGPT and Google Bard. However, the intricate architecture and underlying mathematics driving these remarkable systems remain elusive to many.
LLM's are fascinating - so let's grab a drink and find out how these systems are built and dive deep into their inner workings. In the length of time it to enjoy a round of drinks, you'll understand the inner workings of these models. We'll take our first sip of word vectors, enjoy the refreshing taste of the transformer, and drain a glass understanding how these models are trained on phenomenally large quantities of data.
Large language models for your streaming application - explained with a little maths and a lot of pub stories"
"Monitoring is a fundamental operation when running Kafka and Kafka applications in production. There are numerous metrics available when using Kafka, however the sheer number is overwhelming, making it challenging to know where to start and how to properly utilise them.
This session will introduce you to some of the key metrics that should be monitored and best practices in fine tuning your monitoring. We will delve into which metrics are the indicators for cluster’s availability and performance and are the most helpful when debugging client applications."
Kafka Streams relies on state restoration for maintaining standby tasks as failure recovery mechanism as well as for restoring the state after rebalance scenarios. When you are scaling up or down your application instances, it is necessary to know the current state of the restoration process for each active and standby task in order to prevent a long restoration process as much as possible. During this presentation, you will get an understanding of how KIP-869 provides valuable information about the current active task restoration after a rebalance and KIP-988 opens a window to the continuous process of standby restoration. When you encounter a situation in which you need to choose whether or not to scale up or down your application instances, both KIPs will be an invaluable ally for you.
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
"In this talk, we will dive into the world of Kafka producer configs and explore how to understand and optimize them for better performance. We will cover the different types of configs, their impact on performance, and how to tune them to achieve the best results. Whether you're new to Kafka or a seasoned pro, this session will provide valuable insights and practical tips for improving your Kafka producer performance.
- Introduction to Kafka producer internal and workflow
- Understanding the producer configs like linger.ms, batch.size, buffer.memory and their impact on performance
- Learning about producer configs like max.block.ms, delivery.timeout.ms, request.timeout.ms and retries to make producer more resilient.
- Discuss configs like enable.idempotence, max.in.flight.requests.per.connection and transaction related configs to achieve delivery guarantees.
- Q&A session with attendees to address specific questions and concerns."
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
"Data contracts are one of the hottest topics in the data management community. A data contract is a formal agreement between a data producer and its consumers, aimed at reducing data downtime and improving data quality. Schemas are an important part of data contracts, but they are not the only relevant element.
In this talk, we’ll:
1. see why data contracts are so important but also difficult to implement;
2. identify the characteristics of a well-designed data contract:
discuss the anatomy of a data contract, its main elements and, how to formally describe them;
3. show how to manage the lifecycle of a data contract leveraging Confluent Platform's services."
"In the realm of stateful stream processing, Apache Flink has emerged as a powerful and versatile platform. However, the conventional SQL-based approach often limits the full potential of Flink applications.
We will delve into the benefits of adopting a code-first approach, which provides developers with greater control over application logic, facilitates complex transformations, and enables more efficient handling of state and time. We will also discuss how the code-first approach can lead to more maintainable and testable code, ultimately improving the overall quality of your Flink applications.
Whether you're a seasoned Flink developer or just starting your journey, this talk will provide valuable insights into how a code-first approach can revolutionize your stream processing applications."
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
"Change Data Capture (CDC) has become a commodity in data engineering, much in part due to the ever-rising success of Debezium [1]. But is that all there is? In this lightning talk, we’ll outline the current state of the CDC ecosystem, and understand why adopting a Debezium alternative is still a hard sell. If you’ve ever wondered what else is out there, but can’t keep up with the sprawling of new tools in the ecosystem; we’ll wrap it up for you!
[1] https://debezium.io/"
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
"Separation of compute and storage has become the de-facto standard in the data industry for batch processing.
The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world.
In this talk, we'll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks.
Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka's architecture from the ground up, but the benefits are worth it.
This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero."
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
4. Our Path Forward
4
Enterprise Data, Business Analytics, and AI/ML enablement pyramid
AI and Analytics
Outcomes
Data Architecture
Data ecosystem
Analytics and AI infrastructure
Vision
• What are we trying to achieve?
• What value do we hope to capture?
• Which are the main AI and analytics use cases?
• How much value do we expect from each?
• What are their key constraints and requirements?
• Have we thought of AI assurance?
Enablement Questions
• Do we have the right talent and operating model?
• Do we have a thorough analytics and AI process and
capabilities?
• Do we have single sources of truth for key information
needs?
• Have we launched basic data hygiene actions?
• Do we know what data we have, what it means, and
how to access it?
• How should we grow and manage the data ecosystem?
(e.g. enable accessible, timely, trustworthy, secure data)
• Provide enterprise data management, analytics, and AI
capabilities to accelerate decision advantage from the
boardroom to the battlefield
• JADC2
• Business performance optimization (e.g. business
analytics/metrics)
• AI assurance as-a-service
• Talent management (embedded tactical data teams,
embedded performance teams, upskilling, recruiting)
• Advana, JOS, AI/MLOps platform (includes things like
model catalog, feature store, etc.)
• Data labeling as-a-service
Our priorities in these areas
Data Catalog OSD Policy
• Advana
• Data Engineering
• Data Ontology Foundry
• Data Quality
• Data Catalogue
• API Infrastructure
8. One good thing that came out of it was Advana – DoD’s Data
Management and Analytics Environment
8
Security
Network &
Infrastructure
Service Layer/ API
MLOps/Self Service
AI/ML SW Factory
Data Governance
Data Ops
UI / Apps
Platform
Service
Portal &
Home Page
Data Catalog
Decision
Support
Tools/Products
Self/Service Data/BI
Tools (COTS)
User Profile
File Store
Search
Access
Control
Ontology/
Semantic
Data
Collaboration
• Service Desk
• Knowledge Base
• Product Management and
Document Management
• Email Notifications
• Atlassian-as-a-Service
Service Abstraction Layer / Engine
AI/ML Gateway
Data Gateway
Access Control
• RBAC/ABAC
• PII/PHI Masking and handling
• Classification Guidance
• Complex Policy Enforcement
Data Quality
• Accuracy
• Completeness
• Timeliness
• Consistency
Data Acquisition
• Manual Upload/SFTP/API
• Streaming/Batch
• JSON/XML
• REST/SOAP
Data Storage
• S3/Blob (Scalable)
• Parquet/Delta
• DW, Delta
• Bronze/Silver/Gold
• Graph database
Data Processing
• Elastic and MPP
• Spark
• CPU/GPU
• Tagging/Curating
Data Orchestration
• Auto Promotion
• Cross-Domain Solution
• Self-Service Promotion
• Automation
Service Bus
• Messaging (Pub/Sub)
• Service Registry / Discovery
• Security
Security
• IDAM
• Encryption
• Firewall/ continuous Pen Test
• 3rd Party Assessments
Container
Orchestration
• Monitoring
• Metadata Repo
• Security
• Container Registry
CI/CD
• Build
• Scan
• Test
• Deploy
Monitoring
• Logging/Alerting
• Anomalous Behavior
• Performance
• Service Availability
Network
• CAP/BCAP/IAP
• Cross-Domain Integration
• Multi-Cloud Integration
Cloud Infrastructure
• AWS (IL 5, IL6, JWICS) + Azure (NIPR/SIIPR/JWICS)
• Scalable and Elastic
• Regions + Availability Zones (support 24x7 global
SLAs)
• IDS, Firewalls, etc.
Data Catalog/Governance
• Metadata Capture
• Search
• Data Lineage
• Federated Catalog
Self/Service AI/ML
Dev Tools (COTS)
Data Modeling
• Ontology
• Reference/Master Data
• Common Data Models
• Data Architecture
Experiment and Test
• Package/Lib Mgmt.
• Code/Low Code Dev & RPA
Automation
• Feature Store
• Train/Evaluate/Validate
• SW Dev and Test
• Self-Service Publish Model
Deployment and Operations
• Source Code Control
• CI/CD Publish/Deploy
• Model Registry
• Model Serving and Monitoring
• Model Inference Service
• Model Performance
Data Gateway
Automation
Models
API Gateway
Existing Enterprise Capabilities
Global Search
Reference Architecture Enterprise Capabilities
NIPRNET SIPRNET
18. Empower your analysts
• Tools are restricting, bosses are suffocating, and big
organizations prefer predictability
• Create an environment that encourages risk
• Execution strategy:
• Hold Analysts accountable for insights, and then set them free
• Critical thinking should not be underrated
18
20. Ownership of Analytics: Business
• Think, imagine, and move at the pace of our changing world
• Ownership close to outcomes, proactive, and analytical needs
• Successfully analytics usually outside of IT
• We have found a hub and spoke analytical model to be
successful
20
21. In the end we will have happy
executives and analysts
21
Before Data
Driven
After Data
Driven