[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~
https://satonaoki.wordpress.com/2019/09/30/dbts2019-azure-cosmos-db-deep-dive/
NoSQL Strikes Back (An introduction to the dark side of your data)
A long time ago in a database far, far away...
SQL was the only option to save vast amounts of application data for a long period of time. There were always some rebellion activities, to overcome the SQL Empire, which brought a new hope, but all other ways of storing data were never more than a phantom menace.
Now Cosmos DB awakens and is ready for the revenge of the NoSQL.
During this talk, we will have a look at what Azure Cosmos DB is, what you can achieve with its possibilities and how to use it in a galactic environment of data and applications.
Join me and find your way to the right solution for your application.
May the data be with you!
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs. We’ll also cover the recently announced Redshift Spectrum, which allows you to query unstructured data directly from Amazon S3.
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for as low as $1000/TB/year. This webinar will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Learning Objectives:
• Get an introduction to Amazon Redshift's massively parallel processing, columnar, scale-out architecture
• Learn how to configure your data warehouse cluster, optimize schema, and load data efficiently
• Get an overview of all the latest features including interleaved sorting and user-defined functions
Explore DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
NoSQL Strikes Back (An introduction to the dark side of your data)
A long time ago in a database far, far away...
SQL was the only option to save vast amounts of application data for a long period of time. There were always some rebellion activities, to overcome the SQL Empire, which brought a new hope, but all other ways of storing data were never more than a phantom menace.
Now Cosmos DB awakens and is ready for the revenge of the NoSQL.
During this talk, we will have a look at what Azure Cosmos DB is, what you can achieve with its possibilities and how to use it in a galactic environment of data and applications.
Join me and find your way to the right solution for your application.
May the data be with you!
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs. We’ll also cover the recently announced Redshift Spectrum, which allows you to query unstructured data directly from Amazon S3.
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for as low as $1000/TB/year. This webinar will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Learning Objectives:
• Get an introduction to Amazon Redshift's massively parallel processing, columnar, scale-out architecture
• Learn how to configure your data warehouse cluster, optimize schema, and load data efficiently
• Get an overview of all the latest features including interleaved sorting and user-defined functions
Explore DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
An overview of how Amazon Redshift uses columnar technology, massively parallel processing, and other techniques to deliver fast query performance on petabyte-size datasets.
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution.
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
If you’re familiar with relational databases, designing your app to use a NoSQL database like DynamoDB may be new to you. In this webinar, we’ll walk you through common data design patterns for a variety of applications to help you learn how to design a schema, then store and retrieve the data with DynamoDB. We will discuss the benefits of using DynamoDB to develop mobile, web, IoT, and gaming apps.
Learning Objectives:
Learn schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others
Who Should Attend:
Architects, Developers, and SysOps interested in learning how to design NoSQL schemas to support mobile, web, IoT, AdTech, and gaming apps.
Familiarity with DynamoDB is helpful
Data processing and analysis is where big data is most often consumed, driving business intelligence (BI) use cases that discover and report on meaningful patterns in the data. In this session, we will discuss options for processing, analyzing, and visualizing data. We will also look at partner solutions and BI-enabling services from AWS. Attendees will learn about optimal approaches for stream processing, batch processing, and interactive analytics with AWS services, such as, Amazon Machine Learning, Elastic MapReduce (EMR), and Redshift.
Created by: Jason Morris, Solutions Architect
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
In this session, we discuss how Spark and Presto complement the Netflix big data platform stack that started with Hadoop, and the use cases that Spark and Presto address. Also, we discuss how we run Spark and Presto on top of the Amazon EMR infrastructure; specifically, how we use Amazon S3 as our data warehouse and how we leverage Amazon EMR as a generic framework for data-processing cluster management.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and use work load management.
Learning Objectives:
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
• Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Who Should Attend:
• Data Warehouse Developers, Big Data Architects, BI Managers, and Data Engineers
During a Big Data Warehousing Meetup in NYC, Elliott Cordo, Chief Architect at Caserta Concepts discussed emerging trends in real time data processing. The presentation included processing frameworks such as Spark and Storm, as well datastore technologies ranging from NoSQL to Hadoop. He also discussed exciting new AWS services such as Lambda, Kenesis, and Kenesis Firehose.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this webinar, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance.
Learning Objectives:
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to design schemas and load data efficiently
• Learn best practices for workload management, distribution and sort keys, and optimizing queries
Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.
This 1-day course provides hands-on skills in ingesting, analyzing, transforming and visualizing data using AWS Athena and getting the best performance when using it at scale.
Audience:
This class is intended for data engineers, analysts and data scientists responsible for: analyzing and visualizing big data, implementing cloud-based big data solutions, deploying or migrating big data applications to the public cloud, implementing and maintaining large-scale data storage environments, and transforming/processing big data.
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...Amazon Web Services
Amazon Redshift Spectrum is a new feature that extends Amazon Redshift’s analytics capabilities beyond the data stored in your data warehouse to also query your data in Amazon S3. You can use Amazon Redshift and your existing business intelligence tools to run SQL queries against exabytes of data, and Redshift Spectrum applies sophisticated query optimization, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries.
Data collection and storage is a primary challenge for any big data architecture. This session will focus on the different types of data that customers are handling to drive high-scale workloads on AWS. Our goal is to help you choose the best approach for your workload. We will dive into optimization techniques that improve performance and reduce the cost of data ingestion and AWS services including Amazon S3, DynamoDB, and Kinesis.
Created by: Mark Korver, Senior Solutions Architect
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. We will also explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service. Learn the fundamentals of DynamoDB and see the new DynamoDB console first-hand as we discuss common use cases and benefits of this high-performance key-value and JSON document store.
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
An overview of how Amazon Redshift uses columnar technology, massively parallel processing, and other techniques to deliver fast query performance on petabyte-size datasets.
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution.
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
If you’re familiar with relational databases, designing your app to use a NoSQL database like DynamoDB may be new to you. In this webinar, we’ll walk you through common data design patterns for a variety of applications to help you learn how to design a schema, then store and retrieve the data with DynamoDB. We will discuss the benefits of using DynamoDB to develop mobile, web, IoT, and gaming apps.
Learning Objectives:
Learn schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others
Who Should Attend:
Architects, Developers, and SysOps interested in learning how to design NoSQL schemas to support mobile, web, IoT, AdTech, and gaming apps.
Familiarity with DynamoDB is helpful
Data processing and analysis is where big data is most often consumed, driving business intelligence (BI) use cases that discover and report on meaningful patterns in the data. In this session, we will discuss options for processing, analyzing, and visualizing data. We will also look at partner solutions and BI-enabling services from AWS. Attendees will learn about optimal approaches for stream processing, batch processing, and interactive analytics with AWS services, such as, Amazon Machine Learning, Elastic MapReduce (EMR), and Redshift.
Created by: Jason Morris, Solutions Architect
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
In this session, we discuss how Spark and Presto complement the Netflix big data platform stack that started with Hadoop, and the use cases that Spark and Presto address. Also, we discuss how we run Spark and Presto on top of the Amazon EMR infrastructure; specifically, how we use Amazon S3 as our data warehouse and how we leverage Amazon EMR as a generic framework for data-processing cluster management.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and use work load management.
Learning Objectives:
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
• Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Who Should Attend:
• Data Warehouse Developers, Big Data Architects, BI Managers, and Data Engineers
During a Big Data Warehousing Meetup in NYC, Elliott Cordo, Chief Architect at Caserta Concepts discussed emerging trends in real time data processing. The presentation included processing frameworks such as Spark and Storm, as well datastore technologies ranging from NoSQL to Hadoop. He also discussed exciting new AWS services such as Lambda, Kenesis, and Kenesis Firehose.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this webinar, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance.
Learning Objectives:
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to design schemas and load data efficiently
• Learn best practices for workload management, distribution and sort keys, and optimizing queries
Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.
This 1-day course provides hands-on skills in ingesting, analyzing, transforming and visualizing data using AWS Athena and getting the best performance when using it at scale.
Audience:
This class is intended for data engineers, analysts and data scientists responsible for: analyzing and visualizing big data, implementing cloud-based big data solutions, deploying or migrating big data applications to the public cloud, implementing and maintaining large-scale data storage environments, and transforming/processing big data.
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...Amazon Web Services
Amazon Redshift Spectrum is a new feature that extends Amazon Redshift’s analytics capabilities beyond the data stored in your data warehouse to also query your data in Amazon S3. You can use Amazon Redshift and your existing business intelligence tools to run SQL queries against exabytes of data, and Redshift Spectrum applies sophisticated query optimization, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries.
Data collection and storage is a primary challenge for any big data architecture. This session will focus on the different types of data that customers are handling to drive high-scale workloads on AWS. Our goal is to help you choose the best approach for your workload. We will dive into optimization techniques that improve performance and reduce the cost of data ingestion and AWS services including Amazon S3, DynamoDB, and Kinesis.
Created by: Mark Korver, Senior Solutions Architect
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. We will also explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service. Learn the fundamentals of DynamoDB and see the new DynamoDB console first-hand as we discuss common use cases and benefits of this high-performance key-value and JSON document store.
Amazon DynamoDB is a fast and flexible NoSQL database service for applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model and reliable performance make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.
Learning Objectives:
Understand the differences between relational and non-relational databases
Learn about common use cases for DynamoDB across gaming, ad tech, IoT, and more
See how DynamoDB helps customers handle spikes in traffic and save development time for new feature launches
Who Should Attend:
Developers, IT Decision Makers, and Executives interested in learning more about Amazon Web Services’ serverless NoSQL service to scale mobile, web, IoT, ad tech, and gaming apps
The event, held on 27th April 2019, was part of the Global Azure Bootcamp and covered Microsoft's Cosmos DB, more specifically:
- Introduction to Cosmos DB, its features, internals, resource models, and request units.
- DEMO: Create an SQL API. Download sample .NET app. Simple queries.
- Covered Change Feed and showcased various use case scenarios.
- Detailed Global Distribution and Consistency Models implications.
- DEMO: Mongo - Lift and shift. Run simple .NET code against a MongoDB (in docker container) and cosmos.
- Introduction to Tinkerpop graphs
- DEMO: Graphs API. Download sample .NET app. Simple queries.
https://techspark.mt/global-azure-bootcamp-27th-april-2019/
Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. In this session we covered ,modeling of data using NOSQL cosmos database and how it's helpful for distributed application to maintain high availability ,scaling in multiple region and throughput.
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. Learn the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service, and see the DynamoDB console first-hand. See a walk-through demo of building a serverless web application using this high-performance key-value and JSON document store.
Data collection and storage is a primary challenge for any big data architecture. In this session, we will describe the different types of data that customers are handling to drive high-scale workloads on AWS, and help you choose the best approach for your workload. We will cover optimization techniques that improve performance and reduce the cost of data ingestion.AWS services to be covered include: Amazon S3, DynamoDB, and Kinesis.
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Building the Perfect SharePoint 2010 Farm - SPS SacramentoMichael Noel
Slide deck from Michael Noel's session on Best Practices SharePoint 2010 infrastructure, as presented at SharePoint Saturday Sacramento, 18 June, 2011.
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
Two presentation from the Michigan Information Retrieval Enthusiasts Group Meetup on August 19 by Cengage Learning search platform development team.
Scaling Performance Tuning With Lucene by John Nader discusses primary performance hot spots related to scaling to a multi-million document collection. This includes the team's experiences with memory consumption, GC tuning, query expansion, and filter performance. Discusses both the tools used to identify issues and the techniques used to address them.
Relevance Tuning Using TREC Dataset by Rohit Laungani and Ivan Provalov describes the TREC dataset used by the team to improve the relevance of the Lucene-based search platform. Goes over IBM paper and describe the approaches tried: Lexical Affinities, Stemming, Pivot Length Normalization, Sweet Spot Similarity, Term Frequency Average Normalization. Talks about Pseudo Relevance Feedback.
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon_Org Team
The session will be focused on solutions that require high-throughput ingestion & streaming of data in real-time. You'll get familiar with different business uses-cases and architecture examples to get a common idea as well as understand the concepts of stream processing systems. Next, you'll get deep insights into functional and non-functional capabilities of Azure Event Hub service to see how it fits into the whole picture. Moreover we’ll take a look how to leverage Azure CosmosDB for high-throughput streaming when Event Hub is not suitable by different reasons.
[Developers Festa Sapporo 2020] Microsoft/GitHubが提供するDeveloper Cloud (Develop...Naoki (Neo) SATO
* [Developers Festa Sapporo 2020] Microsoft/GitHubが提供するDeveloper Cloud (Developer Cloud from Microsoft/GitHub)
* https://satonaoki.wordpress.com/2020/12/05/devfesta-microsoft-github/
* https://www.youtube.com/watch?v=sqWnreBtHBg&t=151s
How to work with technology to survive as an engineer (エンジニアとして生き残るためのテクノロジーと...Naoki (Neo) SATO
How to work with technology to survive as an engineer (エンジニアとして生き残るためのテクノロジーとの向き合い方)
https://satonaoki.wordpress.com/2019/07/20/how-to-work-with-technology-to-survive-as-an-engineer/
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
9. Overview of partitioning
Client application
(write)
Another client
application
(read)
Application writes data and
provides a partition key value
with every item
+
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs
10. Overview of partitioning
Client application
(write)
Another client
application
(read)
Cosmos DB uses partition
key value to route data to a
partition
+
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs
11. Overview of partitioning
+
Client application
(write)
Another client
application
(read)
Every partition can store up
to 50GB of data and serve
up to 10,000 RU/s
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs
12. Overview of partitioning
+
Client application
(write)
Another client
application
(read)
The total throughput for the
container will be divided evenly
across all partitions
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs
13. Overview of partitioning
container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
If more data or throughput is
needed, Cosmos DB will add a new
partition automatically
physical
partition 3
5,000 RUs
14. Overview of partitioning
container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
The data will be redistributed
as a result
physical
partition 3
5,000 RUs
15. Overview of partitioning
container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
And the total throughput
capacity will be divided evenly
between all partitions
physical
partition 3
5,000 RUs
16. Overview of partitioning
container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
To read data efficiently, the app
must provide the partition key of
the documents it is requesting
physical
partition 3
5,000 RUs
18. How is data distributed?
{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
Data with
partition keys
19. How is data distributed?
{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
Data with
partition keys
Whenever a document is
inserted, the partition key
value will be checked and
assigned to a physical
partition
pk = 1
20. How is data distributed?
{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
Data with
partition keys
The item will be assigned to a
partition based on its
partitioning key.
pk = 1
21. How is data distributed?
{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
All partition key values will
be distributed amongst the
physical partitions
Data with
partition keys
22. How is data distributed?
{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
However, items with the
exact same partition key
value will be co-located
pk = 1
pk = 1
49. An efficient partitioning strategy
has a close to even
distribution
An inefficient partitioning
strategy is the main source
of cost and performance
challenges
50. An efficient partitioning strategy
has a close to even
distribution
An inefficient partitioning
strategy is the main source
of cost and performance
challenges
A random partition key can
provide an even data
distribution
55. Database Account
(per tenant)
Container w/
Dedicated
Throughput
(per tenant)
Container w/
Shared Throughput
(per tenant)
Partition Key
(per tenant)
Isolation Knobs
Independent geo-replication
knobs
Multiple throughput knobs
(dedicated throughput –
eliminating noisy neighbors)
Independent throughput knobs
(dedicated throughput –
eliminating noisy neighbors)
Group tenants within database
account(s) based on regional needs
Share throughput across tenants
grouped by database
(great for lowering cost on “spiky”
tenants)
Easy management of tenants
(drop container when tenant leaves)
Mitigate noisy-neighbor blast radius
(group tenants by database)
Share throughput across tenants
grouped by container
(great for lowering cost on “spiky”
tenants)
Enables easy queries across tenants
(containers act as boundary for queries)
Mitigate noisy-neighbor blast radius
(group tenants by container)
Throughput
requirements
>400 RUs per Tenant
(> $24 per tenant)
>400 RUs per Tenant
(> $24 per tenant)
>100 RUs per Tenant
(> $6 per tenant)
>0 RUs per Tenant
(> $0 per tenant)
T-Shirt Size
Large
Example: Premium offer for
B2B apps
Large
Example: Premium offer for B2B
apps
Medium
Example: Standard offer for B2B apps
Small
Example: B2C apps
64. In the case of network Partitioning in a distributed
computer system, one has to choose between
Availability and Consistency, but Else, even when
the system is running normally in the absence of
partitions, one has to choose between Latency and
Consistency.
72. Region A
Region B
Region C
Azure
Traffic
Manager
Master
(read/write)
Master
(read/write)
Master
(read/write)
Master
(read/write)
Replica
(read)
Replica
(read)
78. Consistency
Level
Quorum Reads Quorum Writes
Strong Local Minority (2 RU) Global Majority (1 RU)
Bounded
Staleness
Local Minority (2 RU) Local Majority (1 RU)
Session Single replica using
session token(1 RU)
Local Majority (1 RU)
Consistent Prefix Single replica (1 RU) Local Majority (1 RU)
Eventual Single replica (1 RU) Local Majority (1 RU)
forwarder
follower
follower
83. Internet
Device
Traffic ManagerMobile
Browser
West US 2
Cosmos DB
Application
Gateway
Web Tier
Middle Tier
Load
Balancer
North
Europe
Cosmos DB
Application
Gateway
Web Tier
Middle Tier
Load
Balancer
Southeast
Asia
Cosmos DB
Application
Gateway
Web Tier
Middle Tier
Load
Balancer
87. Azure Cosmos DB’s schema-less service automatically indexes all
your data, regardless of the data model, to delivery blazing fast
queries.
Item Color
Microwave
safe
Liquid
capacity
CPU Memory Storage
Geek
mug
Graphite Yes 16ox ??? ??? ???
Coffee
Bean
mug
Tan No 12oz ??? ??? ???
Surface
book
Gray ??? ??? 3.4 GHz
Intel
Skylake
Core i7-
6600U
16GB 1 TB SSD
• Automatic index management
• Synchronous auto-indexing
• No schemas or secondary indices needed
• Works across every data model
GEEK
88. Custom Indexing Policies
Though all Azure Cosmos DB data is indexed by default,
you can specify a custom indexing policy for your
collections. Custom indexing policies allow you to design
and customize the shape of your index while maintaining
schema flexibility.
• Define trade-offs between storage, write and query
performance, and query consistency
• Include or exclude documents and paths to and from the
index
• Configure various index types
{
"automatic": true,
"indexingMode": "Consistent",
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": “Range",
"dataType": "String",
"precision": -1
}, {
"kind": "Range",
"dataType": "Number",
"precision": -1
}, {
"kind": "Spatial",
"dataType": "Point"
}]
}],
"excludedPaths": [{
"path": "/nonIndexedContent/*"
}]
}
89. {
"locations": [
{
"country": "Germany",
"city": "Berlin"
},
{
"country": "France",
"city": "Paris"
}
],
"headquarter": "Belgium",
"exports": [
{ "city": "Moscow" },
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
91. Athens
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
92. locations headquarter exports
0
country city
Germany
Berlin
revenue
200
0 1
city
Athens
city
Berlin
Italy
dealers
0
name
Hans
Bonn
1
country city
France Paris
Belgium
Moscow
94. On-the-fly Index Changes
In Azure Cosmos DB, you can make changes to the
indexing policy of a collection on the fly. Changes can
affect the shape of the index, including paths,
precision values, and its consistency model.
A change in indexing policy effectively requires a
transformation of the old index into a new index.
95. Metrics Analysis
The SQL APIs provide information about performance metrics, such as the
index storage used and the throughput cost (request units) for every
operation. You can use this information to compare various indexing
policies, and for performance tuning.
When running a HEAD or GET request against a collection resource, the
x-ms-request-quota and the x-ms-request-usage headers provide the
storage quota and usage of the collection.
You can use this information to compare various indexing policies,
and for performance tuning.
96. Understand query patterns – which properties are being
used?
Understand impact on write cost – index update RU cost
scales with # properties