This document discusses feature stores and their role in modern machine learning infrastructure. It begins with an introduction and agenda. It then covers challenges with modern data platforms and emerging architectural shifts towards things like data meshes and feature stores. The remainder discusses what a feature store is, reference architectures, and recommendations for adopting feature stores including leveraging existing AWS services for storage, catalog, query, and more.
Managed Feature Store for Machine LearningLogical Clocks
All hyperscale AI companies build their machine learning platforms around a Feature Store.
A feature is a measurable property of some data-sample. It could be for example an image-pixel, a word from a piece of text, the age of a person, a coordinate emitted from a sensor, or an aggregate value like the average number of purchases within the last hour. A Feature Store is a central place to store curated features within an organization.
Feature Stores are a fuel for AI systems as we use them to train machine learning models so that we can make predictions for feature values that we have never seen before.
During this presentation you learn:
- About the concept of a Feature Store and how it can help manage feature data for Enterprises and ease the path of data from backend systems and data-lakes to Data Scientists.
- Our take on Feature Stores, including best practices and use cases and:
- How to ensure Consistent Features in both Training and Serving
Governance, Access-Control, and Versioning
- To create Training Data in the File Format of your Choice
Eliminate Inconsistency between Features in Training and Inferencing
Watch the webinar with a demo: https://www.logicalclocks.com/webinars
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Modern machine learning systems may be very complex and may fall into many pitfalls. It's very easy to unintendedly introduce technical debt into such a complex structure. One of the approaches solving some of anti-patterns is a feature store. Feature store is a missing piece filling a gap between raw data and machine learning models. Not only it will help you to handle technical debt, but even more importantly speeds up time to develop new model.
Delta Lake delivers reliability, security and performance to data lakes. Join this session to learn how customers have achieved 48x faster data processing, leading to 50% faster time to insight after implementing Delta Lake. You’ll also learn how Delta Lake provides the perfect foundation for a cost-effective, highly scalable lakehouse architecture.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
Managed Feature Store for Machine LearningLogical Clocks
All hyperscale AI companies build their machine learning platforms around a Feature Store.
A feature is a measurable property of some data-sample. It could be for example an image-pixel, a word from a piece of text, the age of a person, a coordinate emitted from a sensor, or an aggregate value like the average number of purchases within the last hour. A Feature Store is a central place to store curated features within an organization.
Feature Stores are a fuel for AI systems as we use them to train machine learning models so that we can make predictions for feature values that we have never seen before.
During this presentation you learn:
- About the concept of a Feature Store and how it can help manage feature data for Enterprises and ease the path of data from backend systems and data-lakes to Data Scientists.
- Our take on Feature Stores, including best practices and use cases and:
- How to ensure Consistent Features in both Training and Serving
Governance, Access-Control, and Versioning
- To create Training Data in the File Format of your Choice
Eliminate Inconsistency between Features in Training and Inferencing
Watch the webinar with a demo: https://www.logicalclocks.com/webinars
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Modern machine learning systems may be very complex and may fall into many pitfalls. It's very easy to unintendedly introduce technical debt into such a complex structure. One of the approaches solving some of anti-patterns is a feature store. Feature store is a missing piece filling a gap between raw data and machine learning models. Not only it will help you to handle technical debt, but even more importantly speeds up time to develop new model.
Delta Lake delivers reliability, security and performance to data lakes. Join this session to learn how customers have achieved 48x faster data processing, leading to 50% faster time to insight after implementing Delta Lake. You’ll also learn how Delta Lake provides the perfect foundation for a cost-effective, highly scalable lakehouse architecture.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Unified MLOps: Feature Stores & Model DeploymentDatabricks
If you’ve brought two or more ML models into production, you know the struggle that comes from managing multiple data sets, feature engineering pipelines, and models. This talk will propose a whole new approach to MLOps that allows you to successfully scale your models, without increasing latency, by merging a database, a feature store, and machine learning.
Splice Machine is a hybrid (HTAP) database built upon HBase and Spark. The database powers a one of a kind single-engine feature store, as well as the deployment of ML models as tables inside the database. A simple JDBC connection means Splice Machine can be used with any model ops environment, such as Databricks.
The HBase side allows us to serve features to deployed ML models, and generate ML predictions, in milliseconds. Our unique Spark engine allows us to generate complex training sets, as well as ML predictions on petabytes of data.
In this talk, Monte will discuss how his experience running the AI lab at NASA, and as CEO of Red Pepper, Blue Martini Software and Rocket Fuel, led him to create Splice Machine. Jack will give a quick demonstration of how it all works.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Building a Feature Store around Dataframes and Apache SparkDatabricks
A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform.
Yelp has operated our connector ecosystem to feed vital data to domain-specific teams and data stores. We share some of our learning and experiences on operating such system. We will touch on what is the next phase of the system evolution.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
How to use Azure Machine Learning service to manage the lifecycle of your models. Azure Machine Learning uses a Machine Learning Operations (MLOps) approach, which improves the quality and consistency of your machine learning solutions.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
The world’s largest enterprises run their infrastructure on Oracle, DB2 and SQL and their critical business operations on SAP applications. Organisations need this data to be available in real-time to conduct necessary analytics. However, delivering this heterogeneous data at the speed it’s required can be a huge challenge because of the complex underlying data models and structures and legacy manual processes which are prone to errors and delays.
Unlock these silos of data and enable the new advanced analytics platforms by attending this session.
Find out how to:
• To overcome common challenges faced by enterprises trying to access their SAP data
• You can integrate SAP data in real-time with change data capture (CDC) technology
• Organisations are using Attunity Replicate for SAP to stream SAP data in to Kafka
Speakers:
John Hol, Regional Director, Attunity
Mike Hollobon, Director Business Development, IBT
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Unified MLOps: Feature Stores & Model DeploymentDatabricks
If you’ve brought two or more ML models into production, you know the struggle that comes from managing multiple data sets, feature engineering pipelines, and models. This talk will propose a whole new approach to MLOps that allows you to successfully scale your models, without increasing latency, by merging a database, a feature store, and machine learning.
Splice Machine is a hybrid (HTAP) database built upon HBase and Spark. The database powers a one of a kind single-engine feature store, as well as the deployment of ML models as tables inside the database. A simple JDBC connection means Splice Machine can be used with any model ops environment, such as Databricks.
The HBase side allows us to serve features to deployed ML models, and generate ML predictions, in milliseconds. Our unique Spark engine allows us to generate complex training sets, as well as ML predictions on petabytes of data.
In this talk, Monte will discuss how his experience running the AI lab at NASA, and as CEO of Red Pepper, Blue Martini Software and Rocket Fuel, led him to create Splice Machine. Jack will give a quick demonstration of how it all works.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Building a Feature Store around Dataframes and Apache SparkDatabricks
A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform.
Yelp has operated our connector ecosystem to feed vital data to domain-specific teams and data stores. We share some of our learning and experiences on operating such system. We will touch on what is the next phase of the system evolution.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
How to use Azure Machine Learning service to manage the lifecycle of your models. Azure Machine Learning uses a Machine Learning Operations (MLOps) approach, which improves the quality and consistency of your machine learning solutions.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
The world’s largest enterprises run their infrastructure on Oracle, DB2 and SQL and their critical business operations on SAP applications. Organisations need this data to be available in real-time to conduct necessary analytics. However, delivering this heterogeneous data at the speed it’s required can be a huge challenge because of the complex underlying data models and structures and legacy manual processes which are prone to errors and delays.
Unlock these silos of data and enable the new advanced analytics platforms by attending this session.
Find out how to:
• To overcome common challenges faced by enterprises trying to access their SAP data
• You can integrate SAP data in real-time with change data capture (CDC) technology
• Organisations are using Attunity Replicate for SAP to stream SAP data in to Kafka
Speakers:
John Hol, Regional Director, Attunity
Mike Hollobon, Director Business Development, IBT
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
The data fueling your AI or machine learning initiatives plays a critical role. Different data sources provide different outcomes. The most important thing a business can do to prepare for success with AI and machine learning is to understand and provide access to all of the data that you can possibly get to. In addition to newer data sources, like IoT and Social Media, what will set your results apart – and give your business a competitive advantage – is powering AI and machine learning with your historical and proprietary data: the data sitting in your mainframe, legacy, and other traditional systems.
View this on-demand webcast with Wikibon Analyst James Kobielus as we discuss:
• Using your historical customer data to train predictive AI/ML models for effective target marketing
• Leveraging social, mobile, and IoT data to give your marketing an extra level of personalization
• Making the most of your legacy and proprietary data while protecting customer privacy and ensuring regulatory compliance
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
How to analyze binary data as a technical business user. Use InfoSphere BigInsights to bring analytics on Hadoop closer to a user.
Presented at the OOP conference in Munich, 27.01.2015
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
3 Things to Learn About:
*Building scalable real time architectures for managing data from IoT
*Processing data in real time with components such as Kudu & Spark
*Customer case studies highlighting real-time IoT use cases
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
Hadoop Summit is an industry-leading Hadoop community event for business leaders and technology experts (such as architects, data scientists and Hadoop developers) to learn about the technologies and business drivers transforming data. PwC is helping organizations unlock their data possibilities to make data-driven decisions.
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This presentation will cover Cloud history and Microsoft Azure Data Analytics capabilities. Moreover, it has a real-world example of DW modernization. Finally, we will check the alternative solution on Azure using Snowflake and Matillion ETL.
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
IBM Cloud Private for Data, an ultimate platform for all AI, ML and Data Science workloads. Integrated analytics platform based on Containers and micro services. Works with Kubernetes and dockers, even with Redhat openshift. Delivers the variety of business use cases in all industries- FS, Telco, Retail, Manufacturing etc
Any data source becomes an SQL Query with all the power of
Apache Spark. Querona is a virtual database that seamlessly connects any data source with Power BI, TARGIT, Qlik, Tableau, Microsoft Excel or others. It lets you build your
own universal data model and share it among reporting tools.
Querona does not create another copy of your data, unless you want to accelerate your reports and use build-in execution engine created for purpose of Big Data analytics. Just write standard SQL query and let Querona consolidate data on the fly, use one of execution engines and accelerate processing no matter what kind and how many sources you have.
Options for Data Prep - A Survey of the Current MarketDremio Corporation
Data comes in many shapes and sizes, and every company struggles to find ways to transform, validate, and enrich data for multiple purposes. The problem has been around as long as data, and the market has an overwhelming number of options. In this presentation we look at the problem and key options from vendors in the market today. Dremio is a new approach that eliminates the need for stand alone data prep tools.
Similar to Feature Store as a Data Foundation for Machine Learning (20)
Looking to make your document processing operations more effective and cost-efficient with AI/ML? Learn from the experts of Provectus and Amazon Web Services (AWS) how to choose the right solution for your company! We will look into the management and engineering perspectives of AI document processing, from industry use cases and the solution map to our unique methodology for assessing available document processing solutions to Provectus IDP. Whether you are looking for a ready-made solution or you plan to build a custom solution of your own, this webinar will help you find the best option for your business.
Agenda
- Introductions
- Industry use cases
- Intelligent Document Processing (IDP) overview
- IDP Solutions map
- AWS IDP Solution
- Provectus IDP Platform
- Q&A
Intended Audience
Technology executives and decision makers, including such roles as CIO, CCO, COO, and CDO; digital transformation managers; data and ML engineers.
Presenters
Almir Davletov, IDP Subject Matter Expert, Provectus
Yaroslav Tarasyuk, Business Development, Provectus
Sonali Sahu, Sr. Solutions Architect, AWS
Interested? Learn more about Provectus Intelligent Document Processing Solution: https://provectus.com/document-processing-solution/
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Provectus
Healthcare organizations generate piles of documents and forms in different formats, making it difficult to achieve operational excellence and streamline business processes. Manual entry and OCR are no longer viable, and healthcare entities are looking for new solutions to handle documents.
In this presentation you can learn about:
- Healthcare document types and use cases
- IDP framework: building blocks for document processing solutions
- The document processing market landscape
- Methodology for solution evaluation: comparing apples to apples
Whether you are looking for a ready-made solution or plan to build a custom solution of your own, this webinar will help you find the best fit for your healthcare use cases.
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
Looking to automate document processing in your healthcare organization? Learn from Provectus & AWS experts how to make data capture, conversion, and analytics more efficient. Process and manage documents faster and on a larger scale with AI & Machine Learning.
In this presentation, we offer management and engineering perspectives on document processing with AI, to help you explore available options. Whether you are looking for a ready-made solution or plan to build a custom solution of your own, this webinar will help you find the best fit for your healthcare use cases.
AI Stack on AWS: Amazon SageMaker and BeyondProvectus
Looking to learn more about AWS AI stack? Join experts from Provectus & AWS to find out how to use Amazon SageMaker (with combination with other tools and services) to enable enterprise-wide AI.
Companies are looking to scale and become more productive when it comes to AI and data initiatives. They seek to launch AI projects more rapidly, which, among many other factors, requires a robust machine learning infrastructure. In this webinar, you will learn how to create a canonical SageMaker workflow, expand the SageMaker workflow to a holistic implementation, enhance and expand the implementation using best practices for feature store, data versioning, ML pipeline orchestration, and model monitoring.
Agenda
- Introductions
- Amazon SageMaker Overview
- Real-World Use Case
- Data Lake for Machine Learning
- Amazon SageMaker Experiments
- Orchestration Beyond SageMaker Experiments
- Amazon SageMaker Debugger
- Amazon SageMaker Model Monitor
- Webinar Takeaways
Intended audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Pritpal Sahota, Technical Account Manager, Provectus
- Christopher A. Burns, Sr. AI/ML Solution Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/ai-stack-on-aws-sagemaker-and-beyond-mar-2020/
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
Looking to implement MLOps using AWS services and Kubeflow? Come and learn about machine learning from the experts of Provectus and Amazon Web Services (AWS)!
Businesses recognize that machine learning projects are important but go beyond just building and deploying models, which is mostly done by organizations. Successful ML projects entail a complete lifecycle involving ML, DevOps, and data engineering and are built on top of ML infrastructure.
AWS and Amazon SageMaker provide a foundation for building infrastructure for machine learning while Kubeflow is a great open source project, which is not given enough credit in the AWS community. In this webinar, we show how to design and build an end-to-end ML infrastructure on AWS.
Agenda
- Introductions
- Case Study: GoCheck Kids
- Overview of AWS Infrastructure for Machine Learning
- Provectus ML Infrastructure on AWS
- Experimentation
- MLOps
- Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Qingwei Li, ML Specialist Solutions Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-mlops-and-reproducible-ml-on-aws-with-kubeflow-and-sagemaker-aug-2020/
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
Considering new ways and options for reducing operational costs and scaling flexibility of your Apache Hadoop/Spark? Try migrating to Amazon EMR!
On-premises Apache Hadoop/Spark clusters are among the top sources of financial pressure for businesses. IT organizations want to reduce spend while still meeting demand, to keep their legacy data applications up and running. Come and learn from experts at Provectus & AWS how you can use Amazon EMR to start driving cost efficiencies in your organization!
Agenda
- Hadoop market and cost optimizations using Amazon EMR
- Cost related and other challenges of on-prem Hadoop clusters
- Cost optimizations by using Amazon EMR and migration best practices
Intended audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Pritpal Sahota, Technical Account Manager, Provectus
- Nirav Shah, Senior Solutions Architect, AWS
- Perry Peterson, Business Development Manager, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/cost-optimization-for-apache-hadoop-spark-workloads-with-amazon-emr-june-2020/
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
What's a machine learning workflow? What open source tools can you use to automate ML workflow?
Reproducible ML pipelines in research and production with monitoring insights from live inference clusters could enable and accelerate the delivery of AI solutions for enterprises. There is a growing ecosystem of tools that augment researchers and machine learning engineers in their day to day operations.
Still, there are big gaps in the machine learning workflow when it comes to training dataset versioning, training performance and metadata tracking, integration testing, inferencing quality monitoring, bias detection, concept drift detection and other aspects that prevent the adoption of AI in organizations of all sizes.
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...Provectus
AWS Dev Day Kyiv 2019
Track: Analytics & Machine Learning
Session: "Building a Modern Data platform in the Cloud"
Speaker: Alex Casalboni, AWS Technical Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/HIDnAG9AxZo
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...Provectus
AWS Dev Day Kyiv 2019
Track: Modern Application Development
Session: "How to build a global serverless service"
Speaker: Alex Casalboni, AWS Technical Evangelist
Level: 400
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/Q19B-NTkMfk
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: "Automating AWS Infrastructure with PowerShell"
Speaker: Martin Beeby, AWS Principle Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/rgIjjK2J4dQ
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...Provectus
AWS Dev Day Kyiv 2019
Track: Analytics & Machine Learning
Session: "Analyzing your web and application logs"
Speaker: Javier Ramirez, AWS Technical Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/IpEhEs1sXeg
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: "Resiliency and Availability Design Patterns for the Cloud"
Speaker: Sebastien Stormacq, AWS Technical Evangelist
Level: 400
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/O8gonQCJawU
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: ""Architecting SaaS solutions on AWS""
Speaker: Oleksandr Mykhalchuk, Director of DevOps & Cloud Services at Softserve
Level: 300
Video: https://youtu.be/3lKoe-ts8Qs
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
AWS Dev Day Kyiv 2019
Track: Modern Application Development
Session: "Developing with .NET Core on AWS"
Speaker: Martin Beeby, AWS Principle Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/OzM8L7H1LmA
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: "How to build real-time backends"
Speaker: Martin Beeby, AWS Principle Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/bsZYA6V3bDA
"Integrate your front end apps with serverless backend in the cloud", Sebasti...Provectus
AWS Dev Day Kyiv 2019
Track: Modern Application Development
Session: "Integrate your front end apps with serverless backend in the cloud"
Speaker: Sebastien Stormacq, AWS Technical Evangelist
Level: 200
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://www.youtube.com/watch?v=6z43H11qoU8&t=1s
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019Provectus
AWS Dev Day Kyiv 2019
Track: Analytics & Machine Learning
Session: ""Scaling ML from 0 to millions of users""
Speaker: Julien Simon, Global AI & Machine Learning Evangelist at AWS
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://www.youtube.com/watch?v=N73u1mx9DqY
How to implement authorization in your backend with AWS IAMProvectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: ""How to implement authorization in your backend with AWS IAM""
Speaker: Stas Ivaschenko, AWS solutions architect at Provectus
Level: 400
Video: https://www.youtube.com/watch?v=4Jje_WJ4V7Q
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
"
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
Feature Store as a Data Foundation for Machine Learning
1. Feature Store
as a Data Foundation for ML
Presented by:
Stepan Pushkarev, CTO @ Provectus
Gandhi Raketla, Senior Solutions Architect @ AWS
2. 1. Introductions
2. Modern Data Lakes and Modern ML Infrastructure
3. Emerging Architectural Shifts
4. Feature Store: 200 LOD overview and reference architecture on
AWS
5. AWS Perspective on Feature Store
Agenda
4. Clients ranging from
fast-growing startups
to large enterprises
450 employees and
growing
Established in 2010
HQ in Palo Alto
Offices across the US,
Canada, and Europe
We are obsessed about leveraging cloud, data, and AI to reimagine the
way businesses operate, compete, and deliver customer value
AI-First Consultancy & Solutions
Provider
5. Innovative Tech Vendors
Seeking for niche expertise to
differentiate and win the market
Midsize to Large Enterprises
Seeking to accelerate innovation,
achieve operational excellence
Our Clients
8. Common Challenges:
Data Access and Discoverability
1. Data is scattered across multiple data sources
and technologies
2. Tedious process of managing AWS IAM roles,
Amazon S3 policies, API Gateways, Database
permissions
3. Gets even more complicated in AWS multi-
account setup
4. Metadata is not discoverable
5. As a result - all the investments into Data and
ML are killed by data access issues
9. 1. Lack of ownership and domain context —
A disconnect between data producers
and data consumers
2. Backlogged data team struggling to
keep pace with business demands
3. No Contracts between Data and ML
Engineering
4. As a result, fast end-to-end
experimentation is killed by complex
dependencies between teams
Common Challenges:
Monolithic Data Teams
https://martinfowler.com/articles/data-monolith-to-mesh.html
10. Common Challenges:
ML Experimentation Infrastructure
1. Inherited issues with Data Discovery and
Data Access
2. Reproducibility of datasets, ML pipelines,
ML Environments, and offline experiments
is still an issue
3. Production Experimentation frameworks
are fairly immature yet
4. As a result, the cost of an end-to-end
experiment from data to production ML
metric is 3-6 months
https://hbr.org/2020/03/building-a-culture-of-experimentation
11. Common Challenges:
Scaling ML Adoption in Production
1. Online serving. There is no unified and consistent
way to access features during model serving.
2. Impossible to reuse features between multiple
training pipelines and ML applications.
3. Monitoring and maintenance of ML Applications.
4. As a result, time and cost to scale from 1 to 100
models in production is growing exponentially.
What is your cost per
ML Model in Production?
13. Emerging Architectural Shifts
Data Lake -> Hudi/Delta Lakes
Hudi/Delta Lakes bring managed ingestion, ACID transactions
and point in time queries into traditional Data Lakes
Data Lake -> Data Mesh
Ownership of data domains, data pipelines, metadata, and API
is shifting from centralized teams to product teams
Data Lake -> Data Infrastructure as a platform
Unified reusable platform components and frameworks across
enterprise
Endpoint Protection -> Global Data Governance
Data Security and privacy measures are becoming centralized
as part of Data Platform
Metadata Store -> Global Data Catalog
User Experience around data discovery, lineage, and versioning
requires investments into metadata-rich Data Catalog
Feature Store
Scaling ML Experimentation and Operations requires a
separate data management layer for ML Features
ML Toolkit -> Complete ML Infrastructure
ML capabilities are democratized for ML Engineers and citizen
Data Scientists
14. ACID Data Lakes
● Managed Ingestion
● Dataset versioning for ML training
● Cheap “Deletes” (common GDPR use case)
● Audit log to any changes in datasets
● Brings ACID transactions in your data lake
● “Upserts” strategy on data ingestion
● Enables schemas to enforce data quality
Delta/Hudi Lakes
15. Global Data Governance
Accelerate privacy operations with data you already
have.
Automate business processes, data mapping, and PI
discovery and classification for privacy workflows.
Operationalize policies in a central location.
Govern privacy policies to ensure policies are effectively
managed across the enterprise. Define and document
workflows, traceability views, and business process
registers.
Scale compliance across multiple regulations.
Use a platform designed and built with privacy in mind
that is easily extensible to support new regulations.
AWS Config
AWS Lake Formation
16. Global Data Catalog
Meta-metadata store:
● Does this data exist? Where is it?
● What is the source of truth of this data?
● Do I have access?
● Who is the owner?
● Who are the users of this data?
● Are there existing assets I can reuse?
● Can I trust this data?
* There are no established leaders in open
source
17. The Core of MLOps and Reproducible
Experimentation Pipelines
Model Code
ML Pipeline Code
Infrastructure
as a Code
Versioned
Dataset
Production
Metrics & Alerts
Model Artifacts
Prediction
Service
ML Metrics
Automated Pipeline Execution
Pipeline Metadata
Alerts Reports
Feature Store
Orchestration: Idempotent Execution
Feedback Loop for Production Data
19. Feature Store Value Proposition
A data management layer for machine learning features.
1. Better ROI from feature engineering through reduction of
cost per model — Facilitates collaboration, sharing, and
reusing of features
2. Faster time to market for new models through increased
productivity of ML Engineers - Decoupled storage
implementation and features serving API
20. ● Personalization & Recommendation
Engines
● Dynamic Pricing Optimization
● Supply Chain Optimization
● Logistics and Transportation
Optimization
Feature Store: Canonical Use Cases
● Fraud Detection
● Predictive Maintenance
● Demand Forecasting
* All the use cases where ML models need a
stateful ever changing representation of the
system
21. ● Online Feature Store
Online applications look up for a feature
vector that is sent to an ML model for
predictions
● ML specific Metadata
Enables features discoverability and
reuse
Feature Store: Concepts
● ML Specific API and SDK
High level operations for fetching training
feature sets and online access
● Materialized Versioned Datasets
Maintains versions of featuresets used to
train ML models
Raw
Data Feature StoreFeature Engineering
Training
Serving
Discovery
23. Pros:
● Battle-tested with GoJek, Farfetch,
Postmates, and Zulily
● Integrated with Kubeflow
● Good community
Cons (to be addressed in the roadmap):
● GCP only
● Infrastructure-heavy
● Lacks composability
● No Data Versioning
* Now backed by Tecton
* https://blog.feast.dev/post/a-state-of-feast
Feast
Offline Store
(BigQuery)
Online
Serving
Historical
Serving
Feature
Registry
Online Store
(Redis)
Ingestion
Training
Discovery
Serving
Ingestion
API
Ingestion
24. Pros:
● Integrates with most Python libs for
ingestion and training
● Supports offline store with time travel
● AWS / GCP / Azure / On-Prem Ready
Cons:
● Hard to use out of HopsML
infrastructure
● Online store might not fit all latency
requirements
* Online serving is part of Enterprise version
Hopsworks
Feature
Registry
Offline Store
(Hudi/Hive)
Online
Serving
Historical
Serving
Spark
Online Store
(My SQL)
Training
Discovery
Serving
Pandas
Ingestion
API
27. 1. Start with designing consistent ACID Data Lake before investing
into Feature Store
2. Value from existing open source products does not justify
investments into integration and the dependencies they bring
3. Feature Store must not bring about new infrastructure and
data storage solutions. It has to be a lightweight API and SDK
integrated into your existing data infrastructure.
4. Data Catalog, Data Governance, and Data Quality components
are horizontal for the whole Data Infrastructure, including
Feature Store
5. There are no mature open source or cloud solutions for Global
Data Catalog and Data Quality monitoring.
Lessons Learned
28. Data Infrastructure with Feature Store
Raw
Data
Hot
Storage
Event
Data
Stream Processing
BI Tools
API
Batch Processing Cold
Storage
Workflow Automation
Training
Serving
Feature
Store API
Data
Catalog
Data
Quality
Data
Security
31. Recommendations for going forward with Feature Store:
1. Make sure your existing Data Infrastructure covers
90% of Feature Store requirements (Streaming
Ingestion, Consistency, Catalog, Versioning)
2. Build in-house a lightweight Feature Store API to your
existing storage solutions
3. Collaborate with community and cloud vendors to
maintain compatibility with standards and state of
the art ecosystem
4. Be ready to migrate to managed service or an open
source alternative as the market matures
Recommended Strategy
36. Performance
at scale
Consistent, single-digit
millisecond response times
at any scale; build
applications with virtually
unlimited throughput
Serverless architecture
No hardware provisioning,
software patching, or upgrades;
scales up or down
automatically; continuously
backs up your data
Global replication
You can build global
applications with fast access
to local data by easily
replicating tables across
multiple AWS Regions
Enterprise
security
Encrypts all data by
default and fully integrates
with AWS Identity and
Access Management for
robust security
Amazon DynamoDB
Fast and flexible key-value database service for any scale
37. Read scaling with replicas;
write and memory scaling with
sharding; nondisruptive scaling
Unlimited scale
AWS manages all hardware
and software setup,
configuration, and monitoring
Fully managed
In-memory data store
and cache for sub-millisecond
response times
Consistent high performance
Amazon ElastiCache
Managed, Redis, or Memcached-compatible in-memory data store
38. Performance
& scalability
5x throughput of standard
MySQL and 3x of standard
PostgreSQL; scale out up
to 15 read replicas
Availability
& durability
Fault-tolerant, self-healing
storage; 6 copies of data across 3
AZs; continuous backup to
Amazon S3
Highly
secure
Network
isolation,
encryption at
rest / in transit
Fully
managed
Managed by Amazon RDS:
On your part, no server provisioning,
software patching, setup,
configuration, or backups
Amazon Aurora
MySQL and PostgreSQL-compatible relational database built for the cloud
42. Amazon Athena
Pay per query
Pay only for queries run
Save 30–90% on per-query costs
through compression
Use S3 storage
ANSI SQL
JDBC/ODBC drivers
Multiple formats, compression
types, and complex joins and
data types
SQ
L
Serverless: zero infrastructure,
zero administration
Integrated with QuickSight
EasyQuery instantly
Zero setup cost
Point to S3 and start querying
Serverless, interactive query service
Analytics
43. Questions, details?
We would be happy to answer!
125 University Avenue
Suite 290, Palo Alto
California, 94301
provectus.com