Abstract: Data preparation and modelling are the activities that take most of the time in a typical data scientist workday. In this session we’ll see how AWS services for Analytics and data management can be effectively used and integrated in AI/ML pipelines. We’ll focus on AWS Glue, AWS Glue DataBrew and AWS Data Wrangler with a bit of theory and hands-on demos.
Bio:
Francesco Marelli is a senior solutions architect at Amazon Web Services. He has lived and worked in UK, italy, Switzerland and other countries in EMEA. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems. Francesco also has a strong experience in systems integration and design and implementation of applications.
Topics: machine learning pipelines, AWS, cloud.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
In this session we will introduce key ETL features of AWS Glue and cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We will also discuss how to build scalable, efficient, and serverless ETL pipelines.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Considerations for Data Access in the LakehouseDatabricks
Organizations are increasingly exploring lakehouse architectures with Databricks to combine the best of data lakes and data warehouses. Databricks SQL Analytics introduces new innovation on the “house” to deliver data warehousing performance with the flexibility of data lakes. The lakehouse supports a diverse set of use cases and workloads that require distinct considerations for data access. On the lake side, tables with sensitive data require fine-grained access control that are enforced across the raw data and derivative data products via feature engineering or transformations. Whereas on the house side, tables can require fine-grained data access such as row level segmentation for data sharing, and additional transformations using analytics engineering tools. On the consumption side, there are additional considerations for managing access from popular BI tools such as Tableau, Power BI or Looker.
The product team at Immuta, a Databricks partner, will share their experience building data access governance solutions for lakehouse architectures across different data lake and warehouse platforms to show how to set up data access for common scenarios for Databricks teams new to SQL Analytics.
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
In this session we will introduce key ETL features of AWS Glue and cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We will also discuss how to build scalable, efficient, and serverless ETL pipelines.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Considerations for Data Access in the LakehouseDatabricks
Organizations are increasingly exploring lakehouse architectures with Databricks to combine the best of data lakes and data warehouses. Databricks SQL Analytics introduces new innovation on the “house” to deliver data warehousing performance with the flexibility of data lakes. The lakehouse supports a diverse set of use cases and workloads that require distinct considerations for data access. On the lake side, tables with sensitive data require fine-grained access control that are enforced across the raw data and derivative data products via feature engineering or transformations. Whereas on the house side, tables can require fine-grained data access such as row level segmentation for data sharing, and additional transformations using analytics engineering tools. On the consumption side, there are additional considerations for managing access from popular BI tools such as Tableau, Power BI or Looker.
The product team at Immuta, a Databricks partner, will share their experience building data access governance solutions for lakehouse architectures across different data lake and warehouse platforms to show how to set up data access for common scenarios for Databricks teams new to SQL Analytics.
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
AWS Secrets Manager: Best Practices for Managing, Retrieving, and Rotating Se...Amazon Web Services
In this session, learn how to use AWS Secrets Manager to simplify secrets management and empower your developers to move quickly while raising the security bar in your organization. Also, learn how you can use these changes to more easily meet your compliance requirements. Finally, learn how the service enables you to control access to secrets using fine-grained permissions and centrally audit secret rotation for resources in the AWS Cloud, third-party services, and on-premises.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
On-premise to Microsoft Azure Cloud Migration.Emtec Inc.
This presentation sheds light on migrating on-premise apps to Microsoft Azure cloud. It also highlights the technical capabilities of Microsoft Azure cloud services.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
AWS Glue is a fully managed, serverless extract, transform, and load (ETL) service that makes it easy to move data between data stores. AWS Glue simplifies and automates the difficult and time consuming tasks of data discovery, conversion mapping, and job scheduling so you can focus more of your time querying and analyzing your data using Amazon Redshift Spectrum and Amazon Athena. In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
Azure Synapse is Microsoft's new cloud analytics service offering that combines enterprise data warehouse and Big Data analytics capabilities. It offers a powerful and streamlined platform to facilitate the process of consolidating, storing, curating and analysing your data to generate reliable and actionable business insights.
Amazon EMR enables fast processing of large structured or unstructured datasets, and in this presentation we'll show you how to setup an Amazon EMR job flow to analyse application logs, and perform Hive queries against it. We also review best practices around data file organisation on Amazon Simple Storage Service (S3), how clusters can be started from the AWS web console and command line, and how to monitor the status of a Map/Reduce job.
Finally we take a look at Hadoop ecosystem tools you can use with Amazon EMR and the additional features of the service.
See a recording of the webinar based on this presentation on YouTube here:
Check out the rest of the Masterclass webinars for 2015 here: http://aws.amazon.com/campaigns/emea/masterclass/
See the Journey Through the Cloud webinar series here: http://aws.amazon.com/campaigns/emea/journey/
AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. In this session, we will show you how you can quickly build a data lake on AWS that ingests, catalogs and processes incoming data and makes it ready for analysis. Using a live demo, we demonstrate the capabilities of AWS provided analytical services such as AWS Glue, Amazon Athena and Amazon EMR and how to build a Data Lake on AWS step-by-step.
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfChris Bingham
After recapping the key data & analytics announcements from AWS re:Invent 2022, we look a little deeper at three key new services:
• AWS DataZone
• AWS Omics
• AWS Clean Rooms
And follow up with a demo of using AWS IoT ExpressLink hardware in conjunction with AWS IoT Core, Lambda, and Amplify to build a Gatsby web app that interacts with the AWS IoT ExpressLink demo badge via a device shadow.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
AWS Cloud Adoption Framework and WorkshopsTom Laszewski
The presentation covers the AWS Cloud Adoption Framework (CAF). AWS CAF helps organization accelerate their cloud adoption journey. The framework includes six perspectives - business, people, governance, security, operations, and platform. These six perspectives are used during CAF Envision, Alignment, and Cloud Capability Assessment workshops to enable the art of the possible, identify and mitigate organizational and technology impediments, and score the cloud capabilities of an organization.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
AWS Secrets Manager: Best Practices for Managing, Retrieving, and Rotating Se...Amazon Web Services
In this session, learn how to use AWS Secrets Manager to simplify secrets management and empower your developers to move quickly while raising the security bar in your organization. Also, learn how you can use these changes to more easily meet your compliance requirements. Finally, learn how the service enables you to control access to secrets using fine-grained permissions and centrally audit secret rotation for resources in the AWS Cloud, third-party services, and on-premises.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
On-premise to Microsoft Azure Cloud Migration.Emtec Inc.
This presentation sheds light on migrating on-premise apps to Microsoft Azure cloud. It also highlights the technical capabilities of Microsoft Azure cloud services.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
AWS Glue is a fully managed, serverless extract, transform, and load (ETL) service that makes it easy to move data between data stores. AWS Glue simplifies and automates the difficult and time consuming tasks of data discovery, conversion mapping, and job scheduling so you can focus more of your time querying and analyzing your data using Amazon Redshift Spectrum and Amazon Athena. In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
Azure Synapse is Microsoft's new cloud analytics service offering that combines enterprise data warehouse and Big Data analytics capabilities. It offers a powerful and streamlined platform to facilitate the process of consolidating, storing, curating and analysing your data to generate reliable and actionable business insights.
Amazon EMR enables fast processing of large structured or unstructured datasets, and in this presentation we'll show you how to setup an Amazon EMR job flow to analyse application logs, and perform Hive queries against it. We also review best practices around data file organisation on Amazon Simple Storage Service (S3), how clusters can be started from the AWS web console and command line, and how to monitor the status of a Map/Reduce job.
Finally we take a look at Hadoop ecosystem tools you can use with Amazon EMR and the additional features of the service.
See a recording of the webinar based on this presentation on YouTube here:
Check out the rest of the Masterclass webinars for 2015 here: http://aws.amazon.com/campaigns/emea/masterclass/
See the Journey Through the Cloud webinar series here: http://aws.amazon.com/campaigns/emea/journey/
AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. In this session, we will show you how you can quickly build a data lake on AWS that ingests, catalogs and processes incoming data and makes it ready for analysis. Using a live demo, we demonstrate the capabilities of AWS provided analytical services such as AWS Glue, Amazon Athena and Amazon EMR and how to build a Data Lake on AWS step-by-step.
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfChris Bingham
After recapping the key data & analytics announcements from AWS re:Invent 2022, we look a little deeper at three key new services:
• AWS DataZone
• AWS Omics
• AWS Clean Rooms
And follow up with a demo of using AWS IoT ExpressLink hardware in conjunction with AWS IoT Core, Lambda, and Amplify to build a Gatsby web app that interacts with the AWS IoT ExpressLink demo badge via a device shadow.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
AWS Cloud Adoption Framework and WorkshopsTom Laszewski
The presentation covers the AWS Cloud Adoption Framework (CAF). AWS CAF helps organization accelerate their cloud adoption journey. The framework includes six perspectives - business, people, governance, security, operations, and platform. These six perspectives are used during CAF Envision, Alignment, and Cloud Capability Assessment workshops to enable the art of the possible, identify and mitigate organizational and technology impediments, and score the cloud capabilities of an organization.
Lessons from Migrating Oracle Databases to Amazon RDS or Amazon Aurora Datavail
Learn and leverage database migration best practices from moving off commercial Oracle databases to Amazon RDS or Aurora. We’ll cover common pitfalls, issues, the biggest differences between the engines, migration best practices, and how some of our customers have completed these migrations.
Realize Value, Reduce Costs And Optimize the Value of Your Microsoft Investme...Amazon Web Services
Enterprises around the world are driving growth through innovation when they run Windows based solutions on the leading cloud platform. In addition, enterprises can significantly reduce total cost of ownership and optimize their costs when they choose AWS to host legacy and 3rd party Microsoft applications optimized for Windows Server and SQL Server by taking advantage of our cutting-edge infrastructure, flexible pricing options and licensing solutions. AWS also offers solutions and programs that empower .NET developers to leverage their skills and tools to continue developing cutting edge solutions. So, whether you are migrating a small application or considering divesting an entire datacenter, AWS can scale and support hosting of Windows solutions that help you run your business today.
About the event
AWS Transformation Day is designed for enterprise organizations migrating to the cloud to become more responsive, agile and innovative, while staying secure and compliant. Join us for this one-day event and we’ll share our experiences of helping enterprise customers accelerate the pace of migration and adoption of strategic services.
Who should attend?
This event is recommended for IT and business leaders who are looking to create sustainable benefits and a competitive advantage by using the AWS Cloud. CIOs, CTOs, CISOs, CDOs, CFOs, IT leaders and IT professionals, enterprise developers, business decision makers, and finance executives.
Realize Value of Your Microsoft Investments - AWS Transformation Day Boston 2018Amazon Web Services
AWS Transformation Day is designed for enterprise organizations migrating to the cloud to become more responsive, agile and innovative, while staying secure and compliant. Join us for this one-day event and we’ll share our experiences of helping enterprise customers accelerate the pace of migration and adoption of strategic services.
Realize Value of Your Microsoft Investments - Transformation Day Montreal 2018Amazon Web Services
AWS Transformation Day is designed for enterprise organizations looking to make the move to the cloud in order to become more responsive, agile and innovative, while still staying secure and compliant.
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
This presentation is geared toward enterprise architects and senior IT leaders looking to drive more value from their data by learning about cloud data lake management.
As businesses focus on leveraging big data to drive digital transformation, technology leaders are struggling to keep pace with the high volume of data coming in at high speed and rapidly evolving technologies. What's needed is an approach that helps you turn petabytes into profit.
Cloud data lakes and cloud data warehouses have emerged as a popular architectural pattern to support next-generation analytics. Informatica's comprehensive AI-driven cloud data lake management solution natively ingests, streams, integrates, cleanses, governs, protects and processes big data workloads in multi-cloud environments.
Please leave any questions or comments below.
What can you do with Serverless in 2020Boaz Ziniman
Serverless is always evolving (faster than any definition) and each year new capabilities simplify existing workloads and enable new applications to be implemented in an easier, more efficient way. At AWS, we have focused on improving observability, configuration management, functions invocations, service integrations, and execution environments. Looking at some of the more recent updates, this session is introducing the reasoning behind the new features, and how to use them to reduce your architecture complexity, including real world examples of what AWS customers are doing, so that you can focus on creating value for YOUR customers.
Realize Value of Your Microsoft Investments - AWS Transformation Days Raleigh...Amazon Web Services
Enterprises around the world are driving growth through innovation when they run Windows based solutions on the leading cloud platform. In addition, enterprises can significantly reduce total cost of ownership and optimize their costs when they choose AWS to host legacy and 3rd party Microsoft applications optimized for Windows Server and SQL Server by taking advantage of our cutting-edge infrastructure, flexible pricing options and licensing solutions. AWS also offers solutions and programs that empower .NET developers to leverage their skills and tools to continue developing cutting edge solutions. So, whether you are migrating a small application or considering divesting an entire datacenter, AWS can scale and support hosting of Windows solutions that help you run your business today.
AWS re:Invent 2016: Relational and NoSQL Databases on AWS: NBC, MarkLogic, an...Amazon Web Services
Learn how the AWS Marketplace brings together customers who have challenges with ISVs who have solutions to those challenges. See how to use relational and NoSQL technologies on AWS to build enterprise and consumer apps. NBC used MarkLogic to deliver an award-winning app that can handle high traffic levels and unexpected usage spikes. NBC’s popular, Emmy-winning, “SNL 40” was launched to celebrate the 40th anniversary of Saturday Night Live, and delivers four decades of sketches and performances. Hosted on AWS, the app — as well as a browser-based platform — are powered by the MarkLogic Enterprise NoSQL database. Come learn from the team who collaborated on this project how to run your own database on AWS, and how to integrate with Amazon RDS and other data stores. A world-recognized automotive brand needed to deliver real-time response about their worldwide fleet vehicles. You will learn how they used a combination of AWS services and FileMaker Cloud, (an Apple subsidiary, procured through AWS Marketplace) to deliver high-scale dealer-facing applications.
Look Before You Leap: Migrating On-Premises Hadoop to AWSDevOps.com
Lack of agility, excessive costs, and administrative overhead are convincing on-premises Spark and Hadoop customers to migrate to cloud native services on AWS. As you’re migrating these applications to the cloud, Unravel helps ensure you won’t be flying blind.
Similar to Speed up data preparation for ML pipelines on AWS (20)
How to use the Economic Complexity Index to guide innovation plansData Science Milan
In this talk Mauro Pelucchi will present the Economic Complexity Index (ECI) and the Product Complexity Index (PCI), two network measures that provide unique insights into economic development patterns.We will show how to compute these metrics and explore the network theory behind these indices (Hidalgo and Hausmann, 2009).
The measures are also related to various dimensionality reduction methods and can be used to determine distances between nodes based on their nodes based on their similarity.Finally, we will discover how to interpret these metrics to compare countries, markets, products, and guide our plans in a data-driven context.
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset.
Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes.
Bio:
Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization.
Topics: MLOps, Metaflow, model cards.
Question generation using Natural Language Processing by QuestGen.AIData Science Milan
Manual question generation (worksheets and quizzes) in edtech is not scalable for online transformation and leads to increased workload on teachers due to the pandemic. In this session, we will explore natural language processing (NLP) techniques to generate Multiple Choice Questions automatically from any text content using the T5 transformer model. We will also explore methods to deploy the T5 question generation model for fast CPU inference using ONNX conversion and quantization.
Bio:
Ramsri is a Lead Data Scientist with 8+ years of work experience across Silicon Valley, Singapore, and India. Most recently he had been a co-founder and CTO of a funded AI-assisted assessments startup. He has spent the last 2 years developing question generation models in edtech and also released an open-source library on the same.
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies’ feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform.
In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences.
Bio:
Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master's degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin.
Topics: feature store, MLOps.
Reinforcement Learning is a growing subset of Machine Learning and one of the most important frontiers of Artificial Intelligence. Its goal is to capture higher logic and use more adaptable algorithms than classical Machine Learning.
Formally it denotes a set of algorithms that deal with sequential decision-making and have the potential capability to make highly intelligent decisions depending on their local environment.
Reinforcement Learning problems can be described as an agent that has to make decisions in its environment in order to optimize a cumulative reward, and it is clear that this formalization applies to a great variety of tasks in many different fields.
In this talk, the main features of the most important Reinforcement Learning algorithms will be illustrated and deepened, with some concrete and explanatory examples.
Bio:
Marco Del Pra
Marco was born in Venice 41 years ago, has two master's degrees (Computer Science and Mathematics), and has two important publications in applied mathematics.
He has been working in Artificial Intelligence for 10 years, mainly as a freelancer. Among others, he worked for the European Commission's Joint Research Center, for Cuebiq, and as Data Science Lead for Microsoft's Artificial Intelligence projects in Italy.
Time Series Classification with Deep Learning | Marco Del PraData Science Milan
Today there are a lot of data that are stored in the form of time series, and with the actual large diffusion of real-time applications many areas are strongly increasing their interest in applications based on this kind of data, like for example finance, advertising, marketing, health care, automated disease detection, biometrics, retail, and identification of anomalies of any kind. It is therefore very interesting to understand the role and potential of machine learning in this sector.
Many methods can be used for the classification of the time series, but all of them, apart from deep learning, require some kind of feature engineering as a separate stage before the classification is performed, and this can imply the loss of some important information and the increase of the development and test time. On the contrary, deep learning models such as recurrent and convolutional neural networks already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually. Therefore they are able to extract information from the time series in a faster, more direct, and more complete way.
Bio:
Marco Del Pra
I am 41 years old, I was born in Venice, I have 2 master's degrees (Computer Science and Mathematics). I have been working for about 10 years in Artificial Intelligence, first as Data Scientist, then as Team Leader and finally as Head of Data. Among others, I worked for Microsoft, for the European Commission (JRC of Ispra) and for Cuebiq. I am currently working as a freelancer and I am creating with 2 other cofounders an innovative AI startup. I have 2 important publications in applied mathematics.
Topics: recurrent and convolutional neural networks, deep learning, time-series.
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AIData Science Milan
The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. It is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures.
Bio:
Piero Molino is a Senior Research Scientist at Uber AI with focus on machine learning for language and dialogue. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs.
Audience projection of target consumers over multiple domains a ner and baye...Data Science Milan
Traditional market research is generally conducted by questionnaires or other forms of explicit feedback, directly asked to an ad hoc panel of individuals that in aggregate are representative of a larger group of people. Unfortunately, those traditional approaches are often invasive, nonscalable, and biased. Indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases) are more scalable, authentic, and more suitable for real-time consumer insights.
Although those sources of implicit consumer feedback provide relevant and detailed pictures of the population, they individually provide only a limited set of observable behaviors.
The Holy Grail of market research is the ability to merge different sources of consumers interests into an augmented view that connects all the dots across multiple domains.
Unfortunately, user-centric "fusion" algorithms present many limitations in the case of heterogeneous datasets strongly differing in terms of size and density and when the number of sources to merge increases.
We propose a novel approach of Audience Projection able to define a target audience as a subset of the population in a source domain and to project this target to a set of users into a destination dataset.
We will show how libraries such as spaCy can provide Deep Learning implementations for Named Entity Recognition (NER) to match related brands and we will use Bayesian Inference to transfer knowledge from the source domain. This way, we can estimate the probability of the user to belong to the target using the source distribution of volume of interests of common entities as model evidence and the source target size as prior probability.
Bio:
Gianmario Spacagna is the chief scientist and head of AI at Helixa. His team’s mission is building the next generation of behavior algorithms and models of human decision making with careful attention to their potential and effects on society. His experience covers a diverse portfolio of machine learning algorithms and data products across different industries. Previously, he worked as a data scientist in IoT automotive (Pirelli Cyber Technology), retail and business banking (Barclays Analytics Centre of Excellence), threat intelligence (Cisco Talos), predictive marketing (AgilOne), plus some occasional freelancing. He’s a co-author of the book Python Deep Learning, contributor to the “Professional Manifesto for Data Science,” and founder of the Data Science Milan community. Gianmario holds a master’s degree in telematics (Polytechnic of Turin) and software engineering of distributed systems (KTH of Stockholm). After having spent half of his career abroad, he now lives in Milan. His favorite hobbies include home cooking, hiking, and exploring the surrounding nature on his motorcycle.
Weakly Supervised Learning: Introduction and Best Practices
In the talk we will introduce the definition of three main types of weakly supervised learning: incomplete, inexact and inaccurate; we examine how the models can be trained in case of weak supervision and view the real application of weakly supervised learning, how it can improve results and decrease the costs.
Bio:
Kristina Khvatova works as a Software Engineer at Softec S.p.A. Currently she is involved in the development of a project for data analysis and visualisation; it includes quantitative and qualitative analysis based on classification, optimisation, time series prediction, anomaly detection techniques. She obtained a master degree in Mathematics at the Saint-Petersburg State University and a master degree in Computer Science at the University of Milano-Bicocca.
GANs beyond nice pictures: real value of data generation, Alex HoncharData Science Milan
GANs beyond nice pictures: real value of data generation (theory and business applications)
About the speaker, Alex Honchar:
I am machine learning expert currently applying AI in medtech, fintech and other areas. I also enjoy teaching and blogging (50k+ views monthly) about deep learning applications. As an academia member, I have a track of scientific publications as well. Beside sciences, I travel, do sports and perform card magic.
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoData Science Milan
Humans have the extraordinary ability to learn continually from experience. Not only can we apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of AI is building an artificial continually learning agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex skills and knowledge.
"Continual Learning" (CL) is indeed a fast emerging topic in AI concerning the ability to efficiently improve the performance of a deep model over time, dealing with a long (and possibly unlimited) sequence of data/tasks. In this workshop, after a brief introduction of the topic, we’ll implement different Continual Learning strategies and assess them on common vision benchmarks. We’ll conclude the workshop with a look at possible real world applications of CL.
Vincenzo Lomonaco is a Deep Learning PhD student at the University of Bologna and founder of ContinualAI.org. He is also the PhD students representative at the Department of Computer Science of Engineering (DISI) and teaching assistant of the courses “Machine Learning” and “Computer Architectures” in the same department. Previously, he was a Machine Learning software engineer at IDL in-line Devices and a Master Student at the University of Bologna where he graduated cum laude in 2015 with the dissertation “Deep Learning for Computer Vision: a Comparison Between CNNs and HTMs on Object Recognition Tasks".
Processing 3D images has many use cases. For example, to improve autonomous car driving, to enable digital conversions of old factory buildings, to enable augmented reality solutions for medical surgeries, etc. Also 3D images help in 3D modeling and safety evaluation of products.
3D image processing brings enormous benefits but also amplifies computing cost. The size of the point cloud, the number of points, sparse and irregular point cloud, and the adverse impact of the light reflections, (partial) occlusions, etc., make it difficult for engineers to process point clouds.
Moving from using hand crafted features to using deep learning techniques to semantically segment the images, to classify objects, to detect objects, to detect actions in 3D videos, etc., we have come a long way in 3D image processing.
3D Point Cloud image processing is increasingly used to solve Industry 4.0 use cases to help architects, builders and product managers. I will share some of the innovations that are helping the progress of 3D point cloud processing. I will share the practical implementation issues we faced while developing deep learning models to make sense of 3D Point Clouds.
Attendees: Beginners and Intermediate skilled in Image Processing and 3D Point Clouds
Profile of the speaker:
SK Reddy is the Chief Product Officer AI in Hexagon (www.hexagon.com). He is an AI and ML expert and a successful twice startup entrepreneur. He is an AI startup advisor too. Also he is a frequent speaker in conferences and is an AI blogger.
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Data Science Milan
The notebook and documentation of the original tutorial is available at https://github.com/gm-spacagna/deep-ttf.
Deep Time-to-Failure: predicting failures, churns and customer lifetime using recurrent neural networks.
Machineries and customers are among the most valuable assets for many businesses. A common trait of these assets is that sooner or later they will fail or, in the case of customers, they will churn.
In order to catch those failure events we would ideally consider the whole history of the machine/customer available information and learn smart representations of the system status over time.
Traditional machine learning and statistical models approach the prediction of time-to-failure, aka. expected lifetime, as a supervised regression problem using handcrafted features.
Training those models is hard because of three main reasons:
The complexity of extracting predictive features from time-series without overfitting.
The difficulty of modeling uncertainty and confidence levels in the predictions.
The scarcity of labeled data, failure events are by definition rare and that results in highly unbalanced training datasets.
The first issue can be solved adopting recurrent neural architectures.
A solution to the the last two problems could be to exploit censored data and to build survival regression models.
In this talk we will present a novel technique based on recurrent neural networks that can turn any length-variable sequence of data into a probability distribution representing the estimated remaining time to the failure event. The network will be trained in presence of ground truth as well as with right-censored data.
We will demonstrate using a case study regarding 100 jet engine simulated degradation provided by NASA.
During the tutorial you will learn:
What is Survival Analysis and what are the most popular Survival Regression techniques.
How a Weibull distribution can be used as generic distribution for modeling Time-to-Failure events.
How to build a deep learning algorithm in Keras leveraging recurrent units (LSTM or GRU) that can map raw time-series of covariates into Weibull probability distributions.
The tutorial will also cover a few common pitfalls, visualizations and evaluation tools useful for testing and adapting this approach to generic use cases.
You are free to bring your laptop if you would like to do some live coding and experiment yourself. In this case we strongly encourage to check you have all of the requirements installed in your machine.
More details on the required packages can be found on the Github repository gm-spacagna/deep-ttf.
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...Data Science Milan
50 Shades of Text - Leveraging Natural Language Processing (NLP) to validate, improve, and expand the functionalities of a product
Nowadays, every company either stores or produces text data: from web logs and user queries, to translations and support tickets, yet not everyone knows how to extract valuable insights from it. In this session, we will present a practical case on how to move from raw text data to a valuable business application leveraging upon some of the major NLP methodologies (word embedding, word2vec, doc2vec, fastText, etc.)
Bio: Alessandro is a data veteran. He holds two Master’s degrees in computer engineering, one from Politecnico di Milano and the other from University of Illinois at Chicago (UIC).
He started his career in data consultancy, where he mastered Apache Spark for Machine Learning projects and subsequently joined WW Grainger, one of the largest MRO e-commerce companies in the United States. In September 2017, after more than 5 years in the USA, Alessandro returned to his native country, Italy, where he is now leading a team of data scientists. His current work focuses on achieving energy efficiency through the automation of energy management processes for commercial customers.
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyData Science Milan
“Product close-out strategy” by Ilaria Gianoli, Data Scientist, Data Reply
Abstract:
How to deal with products in their decline phase? Ilaria will share her experience in optimizing the close-out strategy for a multinational retail leader, with a particular focus on the price optimization.
Bio:
Ilaria is a Data Scientist at Data Reply, where she works as a consultant across different industries, in particular in the Retail. She uses her mathematical, statistical and machine learning background to turning data into business opportunities. She also works closely to the business to provide quantitative support for decision making, adapting the complexity of the mathematical models to customer needs.
She holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano.
“Online pricing: from theory to application” by Giovanni Corradini, Data Scientist, Data Reply
Abstract:
Multi-Armed Bandit algorithms are populating the world of e-commerce. How do they work?
Giovanni will share the basic of this field and an application of a state-of-the-art algorithm on real world simulation of the ticket industry.
Bio: Giovanni is a Data Scientist at Data Reply.
He holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano.
He has a background in statistics, machine learning and data mining and he provides decision making support to industries in many different fields.
“Renewal Price Optimization for Subscription products” by Riccardo Lorenzon, Data Scientist, Data Reply
Abstract:
We are observing a huge shift in modern economy from a pay-per-product model to a subscription-based model. When it comes to pricing strategies, it is important both to close the single deal and monetize long-term relationships with the customer. Riccardo will present an application of subscription renewal pricing optimization models for a company belonging to the publishing industry.
Bio:
Riccardo holds a MSc in Mathematical Models for Decision Making from Politecnico di Milano.
He developed hands-on experience on end-to-end data projects across multiple industries. His proactive creativity helps him be very effective in the business case design and early stages of projects.
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...Data Science Milan
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com)
Abstract:
Pirelli, a global performance tire manufacturer, uses data science in its 20 factories to improve quality and efficiency, and reduce energy consumption. For this “Smart Manufacturing” initiative, Pirelli’s data science team has developed predictive models and analytics tools to monitor processes, machines and materials on the factory floors. In this talk we will show some of the solutions we deploy, demonstrate how we used Domino’s data science platform and Plot.ly to build these solutions, and discuss the next steps in this journey towards predictive maintenance.
Bio:
Alberto Arrigoni is a data scientist at Pirelli, where he works to process sensors and telemetry data for IoT, Smart Factories and connected-vehicle applications.
He works closely with all major business units such as R&D, industrial engineering and BI to develop tailored machine learning algorithms and production systems.
He holds a PhD in biostatistics from the University of Milan Bicocca and prior to joining Pirelli was a staff data scientist at the National Institute of Molecular Genetics (Milan), as well as a Fulbright student at the Santa Clara University and visiting PhD student at Pacific Biosciences (Menlo Park, CA).
Brief introduction to Cerved data, the role of data scientist in Cerved and how a data scientist can take advantage from graph database.
Bio:
Stefano Gatti: Born in 1970, has been involved for more than 15 years in several big data and technologies driven projects in leading business information companies like Lince and Cerved. He is very fond of agile metodologies, trying to apply them at all organizational levels. In last years he is strongly engaged in facilitating in Cerved the spread of innovation and the taking advantage from the new big and smart data technologies especially from a business usage perspective. datatelling, open innovation, partnership with smart actors of worldwide data driven innovation ecosystem are his actual mantra. Nunzio Pellegrino: Data Scientist in Cerved, as part of Innovation team, with focus on extract value from data and resolve problems with the latest technologies available. I’ve a degree in Statistics with background in Machine Learning. I’ve being worked primarily in Data Integration and Business Intelligence projects for 3 years. In this moment, I’m product owner of a web application based on GraphDB and involved in Italian Open Data projects. I’m a R enthusiastic, Python practitioner and fascinated of graph ecosystem.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.