Amazon Redshift is a fast, simple, cost-effective data warehousing solution, and in this session, we look at the tools and techniques you can use to migrate your existing data warehouse to Amazon Redshift. We will then present a case study on Scholastic’s migration to Amazon Redshift. Scholastic, a large 100-year-old publishing company, was running their business with older, on-premise, data warehousing and analytics solutions, which could not keep up with business needs and were expensive. Scholastic also needed to include new capabilities like streaming data and real time analytics. Scholastic migrated to Amazon Redshift, and achieved agility and faster time to insight while dramatically reducing costs. In this session, Scholastic will discuss how they achieved this, including options considered, technical architecture implemented, results, and lessons learned.
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Scaling and Modernizing Data Platform with DatabricksDatabricks
Today a Data Platform is expected to process and analyze a multitude of sources spanning batch files, streaming sources, backend databases, REST APIs, and more. There is clearly a need for standardizing the platform that scales and be flexible letting data engineers and data scientists focus on the business problems rather than managing the infrastructure and backend services. Another key aspect of the platform is multi-tenancy to isolate the workloads and able to track cost usage per tenant.
In this talk, Richa Singhal and Esha Shah will cover how to build a scalable Data Platform using Databricks and deploy your data pipelines effectively while managing the costs. The following topics will be covered:
Key tenets of a Data Platform
Setup multistage environment on Databricks
Build data pipelines locally and test on Databricks cluster
CI/CD for data pipelines with Databricks
Orchestrating pipelines using Apache Airflow – Change Data Capture using Databricks Delta
Leveraging Databricks Notebooks for Analytics and Data Science teams
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
Product-thinking is making a big impact in the data world with the rise of Data Products, Data Product Managers, data mesh, and treating “Data as a Product.” But Honest, No-BS: What is a Data Product? And what key questions should we ask ourselves while developing them? Tim Gasper (VP of Product, data.world), will walk through the Data Product ABCs as a way to make treating data as a product way simpler: Accountability, Boundaries, Contracts and Expectations, Downstream Consumers, and Explicit Knowledge.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Data Warehousing is a data architecture that separates reporting and analytics needs from operational transaction systems. This presentation is an introduction into traditional data warehousing architectures and how to determine if your environment requires a data warehouse.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Amazon Web Services
Whether you’re planning a data center shut down or just need to move large volumes of archived data from your on-premises environment, attend this webinar and learn more about how AWS Snowmobile and AWS Snowball Edge can help you migrate your terabytes or petabytes of critical data in a fast, secure and cost effective way. Hear how customers are using these two new services to transform their business model and advance their IT strategy in a way that was not possible before from a time and cost perspective.
Learning Objectives:
• Learn about the capabilities, features, and benefits of AWS Snowball Edge and AWS Snowmobile
• Learn key use cases for AWS Snowball Edge and AWS Snowmobile
• Learn how AWS Snowball Edge is more than just a data transfer service
• Be able to determine when to use which data transfer service from AWS
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Scaling and Modernizing Data Platform with DatabricksDatabricks
Today a Data Platform is expected to process and analyze a multitude of sources spanning batch files, streaming sources, backend databases, REST APIs, and more. There is clearly a need for standardizing the platform that scales and be flexible letting data engineers and data scientists focus on the business problems rather than managing the infrastructure and backend services. Another key aspect of the platform is multi-tenancy to isolate the workloads and able to track cost usage per tenant.
In this talk, Richa Singhal and Esha Shah will cover how to build a scalable Data Platform using Databricks and deploy your data pipelines effectively while managing the costs. The following topics will be covered:
Key tenets of a Data Platform
Setup multistage environment on Databricks
Build data pipelines locally and test on Databricks cluster
CI/CD for data pipelines with Databricks
Orchestrating pipelines using Apache Airflow – Change Data Capture using Databricks Delta
Leveraging Databricks Notebooks for Analytics and Data Science teams
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
Product-thinking is making a big impact in the data world with the rise of Data Products, Data Product Managers, data mesh, and treating “Data as a Product.” But Honest, No-BS: What is a Data Product? And what key questions should we ask ourselves while developing them? Tim Gasper (VP of Product, data.world), will walk through the Data Product ABCs as a way to make treating data as a product way simpler: Accountability, Boundaries, Contracts and Expectations, Downstream Consumers, and Explicit Knowledge.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Data Warehousing is a data architecture that separates reporting and analytics needs from operational transaction systems. This presentation is an introduction into traditional data warehousing architectures and how to determine if your environment requires a data warehouse.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Amazon Web Services
Whether you’re planning a data center shut down or just need to move large volumes of archived data from your on-premises environment, attend this webinar and learn more about how AWS Snowmobile and AWS Snowball Edge can help you migrate your terabytes or petabytes of critical data in a fast, secure and cost effective way. Hear how customers are using these two new services to transform their business model and advance their IT strategy in a way that was not possible before from a time and cost perspective.
Learning Objectives:
• Learn about the capabilities, features, and benefits of AWS Snowball Edge and AWS Snowmobile
• Learn key use cases for AWS Snowball Edge and AWS Snowmobile
• Learn how AWS Snowball Edge is more than just a data transfer service
• Be able to determine when to use which data transfer service from AWS
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the differences among the three types of Amazon EBS block storage: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic. We discuss how to maximize Amazon EBS performance, with a special eye towards low-latency, high-throughput applications like databases. We discuss Amazon EBS encryption and share best practices for Amazon EBS snapshot management. Throughout, we share tips for success.
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWSAmazon Web Services
This session explores some of the key features of Amazon Glacier, including security, durability, and configuration for storing compliance and regulatory data. It covers best practices for managing your cold data, including ingest, retrieval, and security controls. Other topics include: how to optimize storage, upload, and retrieval costs; how to identify the most applicable workloads; and recommended optimizations based on a few sample use cases from a number of industry verticals.
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)Amazon Web Services
Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, DynamoDB Streams, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT.
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech TalksAmazon Web Services
Millions of customers are leveraging AWS for increased flexibility, scalability, and reliability. Attend this hands-on workshop to learn the basics of AWS as you build a simple static website on AWS. After a brief overview, this session will dive into discussions of core AWS services, such as Amazon S3, Route 53 and Amazon CloudFront and demonstrate how to utilize those services to deploy a static website, associate a domain name for it, and enable it to load quickly. By the end of the hands-on session, you will have your own website running in your AWS account.
Learning Objectives:
• Learn how to deploy a static website using Amazon S3. Amazon S3 will provide the origin for your website as well as storage for your static content.
• Associate your domain name with your website using Amazon Route 53. Amazon Route 53 will tell the Domain Name System (DNS) where to find your website.
• Enable your website to load quickly using Amazon CloudFront. Amazon CloudFront will create a content delivery network (CDN) that hosts your website content in close proximity to your users.
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Amazon Web Services
To help prevent unexpected access to your AWS resources, it is critical to maintain strong identity and access policies. It is equally important to track and alert on changes to your AWS resources. In this tech talk, you will learn how to use AWS Identity and Access Management (IAM) to control access to your AWS resources and integrate your existing authentication system with AWS IAM. We will cover how you can deploy and control your AWS infrastructure using code templates, including change management policies with AWS CloudFormation. In addition, we will explore different options for managing both your AWS access logs and your Amazon Elastic Compute Cloud (EC2) system logs using Amazon CloudWatch Logs. We also will cover how to use these logs to implement an audit and compliance validation process using services such as AWS Config, AWS CloudTrail, and Amazon Inspector.
Learning Objectives:
• Understand the AWS Shared Responsibility Model.
• Understand AWS account and identity management options and configuration.
• Learn the concept of infrastructure as code and change management using AWS CloudFormation.
• Learn how to audit and log your AWS service usage.
• Learn about AWS services to add automatic compliance checks to your AWS infrastructure.
With AWS, you can choose the right storage service for the right use case. This session shows the range of AWS choices - object storage to block storage - that is available to you. We include specifics about real-world deployments from customers who are using Amazon S3, Amazon EBS, Amazon Glacier, and AWS Storage Gateway.
Not just for archiving or compliance use cases, Amazon Glacier accommodates customers simply looking to replace their on-premises long term storage with a cost efficient, durable, cloud option, from which they can easily and quickly access their data when they need to. This session will introduce newly launched features for Amazon Glacier, review the current service feature set, and share the global data center shut down and storage strategy for Sony DADC New Media Solutions (NMS). NMS is Sony’s digital servicing division providing global digital distribution, linear playout and white label OTT/Commerce solutions for clients such as BBC Worldwide, NBCUniversal, Sony Playstation, and Funimation Entertainment.
Hear from Andy Shenkler, NMS’s Chief Technology and Solutions Officer as he talks about the key factors that drove the organization’s decision to move away from tape and go towards the cloud and out of the infrastructure business overall. Learn more about the impact and operational practices inside a world class digital supply chain as they were able to move over 20 petabytes of data, over 1M hours of video, to the cloud and never looked back.
With AWS, you can choose the right storage service like including Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Block Storage (Amazon EBS) for the right use case. This session shows the range of AWS choices—from object storage to block storage—that are available to you. The sessions will also include specifics about real-world deployments from customers who are using Amazon S3, Amazon EBS, Amazon Glacier, and AWS Storage Gateway.
Reasons to attend:
Learn how to select which storage options to use, based your requirements for cost, access pattern and use case.
Understand why AWS is a perfect platform for the storage of digital assets, data, media and backups.
Discover how Glacier can revolutionize your long term archive management by removing the need for costly and fragile media types.
Hear about customer use cases and a rich partner ecosystem of services built on AWS storage services.
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
In this session, we provide a peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns with our Redis and Memcached offerings and how customers have used them for in-memory operations to reduce latency and improve application throughput. During this session, we review ElastiCache best practices, design patterns, and anti-patterns.
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
You can gain substantially more business insights and save costs by migrating your on-premise data warehouse to Amazon Redshift, a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. This webinar will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process.
Learning Objectives:
• Understand how Amazon Redshift can deliver a richer, faster analytics at much lower costs.
• Learn key factors to consider before migrating and how to put together a migration plan.
• Learn best practices and tools for migrating schema, data, ETL and SQL queries.
Explore DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others.
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAmazon Web Services
Join our webinar to learn more about how to build a cost effective archive application using Amazon Glacier, an extremely low cost, secure, highly durable, and easy to use storage service in the AWS cloud.
We will explain how Amazon Glacier works and walk through some best practices to get the most out of the service
We will also highlight how to choose between Amazon Glacier and Amazon S3’s Glacier storage option.
Learn more: http://aws.amazon.com/glacier/
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Amazon Web Services
Learn how you can use Amazon ElastiCache to easily deploy a Memcached or Redis compatible, in-memory caching system to speed up your application performance. We show you how to use Amazon ElastiCache to improve your application latency and reduce the load on your database servers. We'll also show you how to build a caching layer that is easy to manage and scale as your application grows. During this session, we go over various scenarios and use cases that can benefit by enabling caching, and discuss the features provided by Amazon ElastiCache.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
In this session, you learn about the latest and hottest features of Amazon Redshift. Join Vidhya Srinivasan, General Manager of Amazon Redshift, to take a deep dive into the architecture and inner workings of Amazon Redshift. You discover how the recent availability, performance, and manageability improvements we’ve made can significantly enhance your end user experience. You also get a glimpse of what we are working on and our plans for the future.
Talk on Amazon Redshift, Meetup Les Nouvelles Organisations, 11/02/2016, Paris - http://www.meetup.com/fr-FR/lesnouvellesorganisations/events/227195680/
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
In this session, we provide an update on Amazon Redshift, and look at a case study from Equinox Fitness Clubs. We show you how Amazon Redshift queries data across your data warehouse and data lake, without the need or delay of loading data, to deliver insights you cannot obtain by querying independent data silos. Discover how Equinox Fitness Clubs transitioned from on-premises data warehouses and data marts to a cloud-based, integrated data platform, built on AWS and Amazon Redshift. Learn about their journey from static reports, redundant data, and inefficient data integration to a modern and flexible data lake and data warehouse architecture that delivers dynamic reports based on trusted data.
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
Many enterprises are turning to Apache Hadoop to enable Big Data Analytics and reduce the costs of traditional data warehousing. Yet, it is hard to succeed when 80% of the time is spent on moving data and only 20% on using it. It’s time to swap the 80/20! The Big Data experts at Attunity and Hortonworks have a solution for accelerating data movement into and out of Hadoop that enables faster time-to-value for Big Data projects and a more complete and trusted view of your business. Join us to learn how this solution can work for you.
Amazon Web Services proporciona una amplia gama de servicios que le ayudarán a crear e implementar aplicaciones de análisis de big data de forma rápida y sencilla. AWS ofrece un acceso rápido a recursos de TI económicos y flexibles, algo que permitirá escalar prácticamente cualquier aplicación de big data con rapidez, incluidos almacenamiento de datos, análisis de clics, detección de elementos fraudulentos, motores de recomendación, proceso ETL impulsado por eventos, informática sin servidor y procesamiento del Internet de las cosas. Con AWS no necesita hacer grandes inversiones iniciales de tiempo o dinero para crear y mantener la infraestructura. En su lugar, puede aprovisionar exactamente el tipo y el tamaño adecuado de los recursos que necesita para impulsar sus aplicaciones de análisis de big data. Puede obtener acceso a tantos recursos como necesite, prácticamente al instante, y pagar únicamente por los utilice.
The Data World Distilled
Understanding how the data world works in the Big Data era
I created this slide deck as a learning tool for new employees, I figured I would post it in case it can help others understand the data space.
This slide deck covers:
- Big Data
- Data Warehouses
- ETL/Data Integration
- Business Intelligence and Analytics
- Data Quality
- Data Testing
- Data Governance
It provides a brief description along with key vendors in the space.
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift Customizing the customer experience based on user behavior is a constant challenge for today’s consumer apps. Business intelligence helps analyze and model large amounts of data. Looker offers a modern approach to BI leveraging AWS that’s fast, agile, and easy to manage. Join this webinar to learn how MessageMe, which provides emotionally engaging messaging apps to consumers, leverages Looker business intelligence software and the Amazon Redshift data warehouse service to analyze billions of rows of customer data in seconds.
Webinar topics include:
• How MessageMe turns billions of rows of customer data stored in Amazon Redshift into actionable insights
• How Looker connects directly to Amazon Redshift in just a few clicks, enabling MessageMe to build a modern, big data analytics in the cloud. Who should attend
• Information or Solution Architects, Data Analysts, BI Directors, DBAs, Development Leads, Developers, or Technical IT Leaders.
Presenters:
• Justin Rosenthal, CTO, MessageMe
• Keenan Rice, VP, Marketing & Alliances, Looker
• Tina Adams, Senior Product Manager, AWS
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
Erik Baardse and Ajit Gadge from EDB Postgres presented on how to transform your DBMS in order to drive digital business. How Postgres enables you to support a wider range of workloads with your relational database which opens the Big Data doors. They also cover EnterpriseDB’s Strategy around Big Data which focuses on 3 areas and finally last but not the last how to find money in IT with Big Data and digital transformation
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. In this session we'll give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
What's the origin of Big Data? What are the real life usage scenarios where Hadoop has been successfully adopted? How do you get started within your organizations?
Using real time big data analytics for competitive advantageAmazon Web Services
Many organisations find it challenging to successfully perform real-time data analytics using their own on premise IT infrastructure. Building a system that can adapt and scale rapidly to handle dramatic increases in transaction loads can potentially be quite a costly and time consuming exercise.
Most of the time, infrastructure is under-utilised and it’s near impossible for organisations to forecast the amount of computing power they will need in the future to serve their customers and suppliers.
To overcome these challenges, organisations can instead utilise the cloud to support their real-time data analytics activities. Scalable, agile and secure, cloud-based infrastructure enables organisations to quickly spin up infrastructure to support their data analytics projects exactly when it is needed. Importantly, they can ‘switch off’ infrastructure when it is not.
BluePi Consulting and Amazon Web Services (AWS) are giving you the opportunity to discover how organisations are using real time data analytics to gain new insights from their information to improve the customer experience and drive competitive advantage.
Modern apps and services are leveraging data to change the way we engage with users in a more personalized way. Skyla Loomis talks big data, analytics, NoSQL, SQL and how IBM Cloud is open for data.
Learn more by visiting our Bluemix Hybrid page: http://ibm.co/1PKN23h
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
The Impact of SMACT on the Data Management StackSnapLogic
This presentation introduces the concept of the "Integrator's Dilemma" and reviews some of the challenges faced by traditional data and application integration technologies when it comes to keeping up with the new enterprise data, application and API connectivity and management requirements. We review the landscape and share examples of the steps more and more IT organizations are taking to improve business alignment through faster access to trusted data.
To learn more, visit http://www.snaplogic.com/ipaas
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
2. Today’s agenda
• Amazon Redshift Overview
• Use cases and benefits
• Migration options
• Scholastic’s use case
• Architecture details
• Technical overview
• Key project learnings
3. Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
4. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical
representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any
vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
Forrester Wave™ Enterprise Data Warehouse Q4 ’15
6. Why migrate to Amazon Redshift?
100x faster
Scales from GBs to PBs
Analyze data without storage
constraints
10x cheaper
Easy to provision and operate
Higher productivity
10x faster
No programming
Standard interfaces and
integration to leverage BI tools,
machine learning, streaming
Transactional database MPP database Hadoop
7. Migration from Oracle @ Boingo Wireless
2000+ Commercial Wi-Fi locations
1 million+ Hotspots
90M+ ad engagements
100+ countries
Legacy DW: Oracle 11g based DW
Before migration
Rapid data growth slowed
analytics
Mediocre IOPS, limited memory,
vertical scaling
Admin overhead
Expensive (license, h/w, support)
After migration
180x performance improvement
7x cost savings
9. Migration from Greenplum @ NTT Docomo
68 million customers
10s of TBs per day of data across
mobile network
6PB of total data (uncompressed)
Data science for marketing
operations, logistics etc.
Legacy DW: Greenplum on-premises
After migration:
125 node DS2.8XL cluster
4,500 vCPUs, 30TB RAM
6 PB uncompressed
10x faster analytic queries
50% reduction in time for new BI
app. deployment
Significantly less ops. overhead
10. Migration from SQL on Hadoop @ Yahoo
Analytics for website/mobile events
across multiple Yahoo properties
On an average day
2B events
25M devices
Before migration: Hive – Found it to be
slow, hard to use, share and repeat
After migration:
21 node DC1.8XL (SSD)
50TB compressed data
100x performance improvement
Real-time insights
Easier deployment and
maintenance
11. Migration from SQL on Hadoop @ Yahoo
1
10
100
1000
10000
Count
Distinct
Devices
Count All
Events
Filter
Clauses
Joins
Seconds
Amazon Redshift
Impala
12. Business Value and Productivity
Business Productivity Benefits
Analyze more data
Faster time to market
Get better insights
Match capacity with demand
13. ENGINE X Amazon Redshift
ETL Scripts
SQL in reports
Adhoc. queries
How to Migrate?
Schema Conversion Database Migration
Map data types
Choose compression
encoding, sort keys,
distribution keys
Generate and apply DDL
Schema & Data
Transformation
Data Migration
Convert SQL Code
Bulk Load
Capture updates
Transformations
Assess Gaps
Stored Procedures
Functions
1 2
3
4
14. Convert schema in a few clicks
Sources include Oracle, Teradata,
Greenplum and Netezza
Automatic schema optimization
Converts application SQL code
Detailed assessment report
AWS Schema
Conversion Tool
(AWS SCT)
16. Start your first migration in few minutes
Sources include: Aurora, Oracle, SQL
Server, MySQL and PostgreSQL
Bulk load and continuous replication
Migrate a TB for $3
Fault tolerant
(AWS DMS)
21. Where were we?
Platform
13+ years old. IBM AS/400 DB2 and Microsoft SQL Server are the primary data
warehouse platforms. BI Platform is primarily Microsoft (SSRS, SSAS, Excel, SharePoint)
500+ direct users across every LOB and business function
20+ TB. 5,500+ DB2 workloads, 350+ SQL Server workloads, 15 SSAS cubes, 150+
SSRS reports
Challenges
Inflexible, multi-layered architecture – slow time to market
Inability to meet internal SLAs due to performance of daily ETL processes
Scalability limitations with SQL Server Analysis Services (SSAS) for reports
Limited ability to perform self-service Business Intelligence
21
22. Moving forward: Key decision factors
• Improved performance, scalability, availability,
logging, security
• Enablement of self service business intelligence
• Leverage the skill set of current team (Relational DB
& SQL)
• Integration with existing technology stack
• Alignment with the tech strategy (devops model,
Cloud First)
• Ability to support Big Data initiatives
• Team up with an experienced consulting partner
22
23. Why we chose AWS and Amazon Redshift
AWS was chosen for its agility, scalability, elasticity, and
security
Redshift
• Scalable, fast
• Managed service, cost-optimization models,
elastic
• SQL/relational matched skillset of team
S3 was chosen as location for ingestion process
NorthBay was chosen as the implementation partner for
their expertise in Big Data and Redshift migrations
23
24. How the project unfolded
Goals
• 3-month pilot to migrate a Functional area in key LOB
• Demonstrate immediate business value
• Use AWS Stack & Open Source for Data Movement from DB2
(No CDC/ETL tool)
Outcomes
• Core Framework for Migration
• ELT Architecture and Validation
• Visualization/Self-service capability through Tableau
26. Core Framework
• Jobs and Job Groups are defined as metadata in DynamoDB
• Control-M scheduler, Custom Application and Data Pipeline for
Orchestration
• ELT Process with EMR/Sqoop for Extraction. Load and Transform
the data through Redshift SQL scripts
• Core Framework enables
• Restart capability from point of failure
• Capturing of operational statistics (# of rows updated, etc.)
• Audit capability (which feed caused the Fact to change, etc.)
26
27. Extract
• Pre-create EMR resources at the start of Batch
• Achieve parallelism in Sqoop with mappers and Fair Scheduling
• Sqoop query to add additional fields like Batch_id, Updated_date etc
• Data extracts are split and compressed for optimized loading into Redshift
27
AS400 / DB2
EMR with Sqoop
S3
Metadata
KMS
Data Pipeline
1
2
3
4
5 6
Control Flow
Data Flow
28. Load
• Truncate and Load through Data Pipeline for Staging tables
• Dynamic Work Load Management (WLM) queues setup to allow maximum
resources during Loading/Transformation
• Check and terminate any locks on tables to allow truncation
• Capture metrics related to number of rows loaded, time taken, etc.28
StagingS3
KMS
Data Pipeline
4
1 2
3
EC2 Control Flow
Data Flow
29. Transform
• Custom Application for building Dimensions and Facts
• SQL Scripts are stored in S3 and executed by ELT process
• SQL scripts refactored from SQL Server and AS400 scripts
• Non-Functional Requirements are achieved through Custom App
29
1
3
2
4
5
6
7a
7b
S3
Staging
Facts
Metadata
Dimensions
App
Control Flow
Data Flow
30. Schema Design
• Modified Star Schema
• Natural Keys instead of generating unique identifiers
• Commonly used columns from Dimensions are copied over to
Facts
• Surrogate keys are eliminated except for few cases
• Compression
• Define appropriate Distribution and Sort Keys
• Define primary key and Foreign keys
31. Security
• AWS Key Management Service (KMS) is used for encrypting
access credentials to Source and Target databases
• Jenkins job to allow encrypting of credentials using KMS
directly by Database Administrators
• Amazon EMR, Jenkins resources are given KMS decrypt
permissions to allow connecting to Sources and Targets during
the ELT process
• Standard Security in Transit and at Rest throughout the process
• IAM federation through Enterprise Active Directory
31
32. Reporting
• Business users access to Facts/Dimensions through Tableau
• Power users access to Staging tables through Tableau
• Enable Data Analysts access to files in S3 using Hive/Presto
• Self-Service capability across business users
32
S3 Staging Facts/ Dimensions
Business
Analysts
Power
Users
Data
Analysts
EMR
Presto/Hive
33. Workstream Effort
• Define Jobs and Job Groups specific to each
Workstream
• Create Redshift tables (Staging, Facts, Dimensions)
based on mapping from AS400 and best practices
learned
• Create new SQL scripts (based on the logic from
AS400/SQL Server code) for transformation
• Develop, Test and Deploy in 2-week Agile sprints
33
34. Key Lessons - Technical
• Isolate core framework with project specific code repositories
• Consolidating logging solution across Amazon S3, Amazon
Redshift, Amazon DynamoDB etc., was a challenge
• Make appropriate schema changes when migrating to new
platform
• Custom Framework for gathering operational stats (eg: # of
rows loaded etc.)
• Start with Test Automation tools and Acceptance Test Driven
Development (ATDD) earlier in the project
34
35. Project timeline revisited
After the successful pilot:
• Executive Leadership accelerated timeline:
• Reduce project timeline by 50% (to 12 months) to
deliver value faster to LOBs
• Realize cost savings by eliminating the DB2 and
SQL Server platforms earlier
• Users wanted to be on the new platform!
• Scholastic & NorthBay partnered to create a
training curriculum to ensure a supply of skilled
staff would be available to our teams
35
36. Scaling up: 7 workstreams
• Developed a model for estimating effort and cost
(AWS costs & Labor per LOB migration)
• Running agile teams in parallel – employed Agile
coaches
• Enhanced the core framework to ensure it would
scale effectively when in use by multiple teams
simultaneously
• Building a Code repository for use by all teams
• Building CI / CD Frameworks
37. Where are we now?
• 4 of 7 LOBs migrated – framework enables complete migration of a
functional area within days/weeks as opposed to months. On track to
migrate and decommission entire legacy environment within next 6
months
• 10 weeks to migrate from an external vendor hosting data and providing
reports for one LoB
• Cost of Data Ingestion Framework is under $40/day (EC2, EMR, Data
Pipeline)
• First “Big Data” initiative in production, captures and processes an
average of 1.5 Million e reading events daily (peak: 7 Million)
• Profile: LOB #1
• Loading ~5-6 Million rows/day (6-7GB/day)
• Processing over 1.5 billion rows within Redshift daily
• Complete ETL/ELT batch cycle performance improved by over 170%
38. Key lessons – project execution
• Essential to monitor and optimize AWS costs
• “Data Champion” / “Data Guide” partnership absolutely critical for
successful adoption of new platforms
• Importance of strong Agile coaches while scaling out Agile teams
• Criticality of choosing consulting partners (AWS & North Bay)
who can ramp up and supply key resources fast and cycle off the
project when finished
• Creating new data platforms and migrating data into them is
easy, especially with AWS. Decommission of existing data
platforms is hard!
38
41. Related Sessions
Hear from other customers discussing their Amazon Redshift use cases:
• BDM402—Best Practices for Data Warehousing with Amazon Redshift (King.com)
• BDA304—What’s New with Amazon Redshift
• SVR308—Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year
• GAM301—How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful
Player Insights
• BDA207—Fanatics: Deploying Scalable, Self-Service Business Intelligence on AWS
• BDM306— Netflix: Using Amazon S3 as the fabric of our big data ecosystem
• BDA203 — Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift
(GE Power and Water)
• BDM206 — Understanding IoT Data: How to Leverage Amazon Kinesis in Building an IoT
Analytics Platform on AWS (Hello)
• STG307— Case Study: How Prezi Built and Scales a Cost-Effective, Multipetabyte Data Platform
and Storage Infrastructure on Amazon S3