Cindy Gross aka @SQLCindy of @NealAnalytics presents a series of "Small Bites of Big Data" lessons on how to create an HDInsight cluster on Microsoft Azure. Recordings available via the YoutTube pllaylist Getting Started with HDInsight: https://www.youtube.com/playlist?list=PLAD2dOpGM3s1R2L5HgPMX4MkTGvSza7gv. Blog available at http://blogs.msdn.com/b/cindygross/archive/2015/02/26/create-hdinsight-cluster-in-azure-portal.aspx.
The document introduces the Windows Azure HDInsight Service, which provides a managed Hadoop service on Windows Azure. It discusses big data and Hadoop, describes the components included in HDInsight like HDFS, MapReduce, Pig and Hive. It provides examples of using Pig, Hive and Sqoop with HDInsight and explains how HDInsight is administered through the management portal.
This document summarizes data storage options on Windows Azure, including hosted and host your own options. Hosted options provide higher availability and less administrative overhead, while hosting your own provides more flexibility but requires managing availability, performance, scaling, and costs. The document compares data store types like key-value, document, columnar, and graph databases and discusses considerations for choosing a data store based on data type, volume, latency needs, and growth.
This document provides an overview of using Python applications on the Microsoft Azure platform. It discusses various deployment options on Azure including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and container services. It then focuses on deploying a Python web app on Azure Web Apps using the PaaS option. Code samples are provided for the Python web app, requirements file, runtime configuration file, and web.config file. Steps for deploying the application via Git push are also outlined. The document concludes with a demonstration of managing Azure resources through the Azure SDK for Python.
The document discusses using Node.js on the Windows Azure platform. It describes how Node.js is a JavaScript runtime for building scalable network applications, and how it is fully supported on Windows Azure through deployment options like Web Sites and Cloud Services. It also introduces Web Matrix 2 as a lightweight IDE for developing Node.js applications on Windows Azure, providing features like IntelliSense and publishing capabilities.
1) Amazon ElastiCache is a fully managed in-memory data store that can be used with Redis or Memcached to improve performance for applications.
2) ElastiCache provides high availability with automatic failover between availability zones and supports various caching strategies like write-through and lazy loading.
3) Example use cases for ElastiCache include session management, database caching, APIs, streaming data, and as an event store for serverless applications.
A presentation discussing how to run a large-scale Drupal installation using Amazon Web Services (AWS). The final system is capable of serving millions of unique pages, and storing tens of terabytes of data.
First presented at DrupalCamp Brighton in January 2015. There is an hour long recording of this presentation at https://www.youtube.com/watch?v=Rh_yBzRpOnk
The document discusses various hosting solutions for Drupal including web hosting, virtual private servers, dedicated servers, and Amazon EC2. It provides details on the costs, reliability, customization options, and maintenance requirements for each solution. Additionally, it covers some key terms and tools related to using Amazon EC2, such as instances, AMIs, EBS, S3 storage, the command line interface, and the ElasticFox browser plugin.
This document provides an overview of Azure Virtual Machines, including how to provision VMs, available VM sizes and pricing, data persistence options, high availability features, networking capabilities, and load balancing options. Key points include being able to launch Windows and Linux VMs in minutes and scale from 1 to 1000s of instances with per-minute billing. VM extensions enable customization, and VMs can be made highly available through features like availability sets and fault domains. Virtual networks allow creating protected private networks in Azure that can connect to on-premises environments.
The document introduces the Windows Azure HDInsight Service, which provides a managed Hadoop service on Windows Azure. It discusses big data and Hadoop, describes the components included in HDInsight like HDFS, MapReduce, Pig and Hive. It provides examples of using Pig, Hive and Sqoop with HDInsight and explains how HDInsight is administered through the management portal.
This document summarizes data storage options on Windows Azure, including hosted and host your own options. Hosted options provide higher availability and less administrative overhead, while hosting your own provides more flexibility but requires managing availability, performance, scaling, and costs. The document compares data store types like key-value, document, columnar, and graph databases and discusses considerations for choosing a data store based on data type, volume, latency needs, and growth.
This document provides an overview of using Python applications on the Microsoft Azure platform. It discusses various deployment options on Azure including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and container services. It then focuses on deploying a Python web app on Azure Web Apps using the PaaS option. Code samples are provided for the Python web app, requirements file, runtime configuration file, and web.config file. Steps for deploying the application via Git push are also outlined. The document concludes with a demonstration of managing Azure resources through the Azure SDK for Python.
The document discusses using Node.js on the Windows Azure platform. It describes how Node.js is a JavaScript runtime for building scalable network applications, and how it is fully supported on Windows Azure through deployment options like Web Sites and Cloud Services. It also introduces Web Matrix 2 as a lightweight IDE for developing Node.js applications on Windows Azure, providing features like IntelliSense and publishing capabilities.
1) Amazon ElastiCache is a fully managed in-memory data store that can be used with Redis or Memcached to improve performance for applications.
2) ElastiCache provides high availability with automatic failover between availability zones and supports various caching strategies like write-through and lazy loading.
3) Example use cases for ElastiCache include session management, database caching, APIs, streaming data, and as an event store for serverless applications.
A presentation discussing how to run a large-scale Drupal installation using Amazon Web Services (AWS). The final system is capable of serving millions of unique pages, and storing tens of terabytes of data.
First presented at DrupalCamp Brighton in January 2015. There is an hour long recording of this presentation at https://www.youtube.com/watch?v=Rh_yBzRpOnk
The document discusses various hosting solutions for Drupal including web hosting, virtual private servers, dedicated servers, and Amazon EC2. It provides details on the costs, reliability, customization options, and maintenance requirements for each solution. Additionally, it covers some key terms and tools related to using Amazon EC2, such as instances, AMIs, EBS, S3 storage, the command line interface, and the ElasticFox browser plugin.
This document provides an overview of Azure Virtual Machines, including how to provision VMs, available VM sizes and pricing, data persistence options, high availability features, networking capabilities, and load balancing options. Key points include being able to launch Windows and Linux VMs in minutes and scale from 1 to 1000s of instances with per-minute billing. VM extensions enable customization, and VMs can be made highly available through features like availability sets and fault domains. Virtual networks allow creating protected private networks in Azure that can connect to on-premises environments.
Azure Virtual Machines Deployment ScenariosBrian Benz
Architecture and Scenarios for deploying Database and middleware applications on Azure Virtual Machines including SQL Server, Oracle, Hadoop, and others.
Overview of Windows Azure Virtual Machines - the IaaS offering in the Windows Azure platform. The presentation covers the compute, storage and network features of Virtual Machines. It also describes how best to deploy Windows Azure cloud services and VMs.
This document discusses Amazon DynamoDB Accelerator (DAX), a fully managed caching service for DynamoDB. DAX provides in-memory caching that can improve the performance of DynamoDB by an order of magnitude by eliminating disk I/O operations and reducing latency from milliseconds to microseconds. DAX is API compatible with DynamoDB, making it easy to add caching without code changes. It provides high availability across Availability Zones and automatic replication of data. DAX can cache entire tables or specific hot keys to improve performance and reduce costs.
This presentation discusses Windows Azure Blob Storage, covering from the Windows Azure Storage Overview, Blob Storage Basic Concept, Blob Storage Advanced, and finally the Tip of the day.
This document provides instructions for a hands-on lab to set up ElastiCache for Redis, Amazon RDS for MySQL, and load Landsat satellite image data into both systems. Key steps include: creating EC2 and database instances; installing MySQL and Redis clients; loading data into the MySQL database; running a Python script to populate Redis from MySQL; and performing sample queries against each database to compare performance.
We believe that security *IS* a shared responsibility, - when we give developers the power to create infrastructure, security became their responsibility, too.
During this meetup, we'd like to share our experience with implementing security best practices, to be implemented directly by development teams to build more robust and secure cloud environments. Make cloud security your team's sport!
Cosmos DB is Microsoft's flagship Serverless database service in the Azure cloud. This slide-deck, presented at the Nashville Azure Meetup event on 09/20/2018 covers the why and what of Cosmos DB was is meant to be a good segue into further detailed and advanced topics. The slide-deck presents 3 use-cases for using Cosmos DB in E-Commerce, Healthcare, and IoT. Stay Tuned!
MS Cloud Day - Building web applications with Azure storageSpiffy
This document provides an overview and agenda for a Microsoft Cloud Day session on building web applications with Azure Storage. The session will cover Blob, Table, and Queue storage capabilities in Azure, including how to create storage accounts, upload and retrieve blobs, create and query tables, and use queues for communication between services. Attendees will learn best practices for scalability when using Azure Storage.
Karan Gulati compiled a list of informative articles and videos about HDInsight to help users get started. The document lists links to introductory articles on HDInsight and its technologies like Hadoop, Hive, and Storm. It also references two Channel9 videos and a book about using Microsoft's HDInsight service on Azure to work with big data and Hadoop. Users are encouraged to add additional useful resources to the list.
The document introduces CouchDB, an open-source document-oriented database. It notes that CouchDB uses a document model with JSON documents and map-reduce functions. It also discusses three main reasons for considering CouchDB - its focus on availability over consistency, ability to scale horizontally using map-reduce, and being well-suited for web applications via its RESTful API.
This document discusses caching services available on Windows Azure, including content delivery networks (CDNs) and caching. It describes how CDNs deliver content closer to end users, and caching stores frequently accessed data closer to Azure applications. Caching on Azure can be done through dedicated roles, co-location with applications, or shared caching services. The document outlines characteristics of CDNs like dedicated endpoints and worldwide datacenters. It also provides examples of caching configuration and workflows in Visual Studio and code samples for putting and getting items from the cache.
This talk will compare the major cloud hosting companies and what products and services they offer. Google's App Engine, Amazon's AWS, Rackspace's Cloud Services and Linode will be compared. We will go beyond a mere checklist of features and dive into such topics as Perl support, cost structures, development strategies, underlying architectures, performance and security.
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...Amazon Web Services
This webinar will emphasize how easy it is to deploy AWS resources with access to various publicly available AMIs, SaaS solutions, and CloudFormation templates to get started quickly with AWS. This session will dig deeper into how to launch critical business applications on AWS such as deploy an emergency website, launch SharePoint server and more. The gist of the webinar will be on ease of use and ability to clone environments that largest customers are running while trivializing undifferentiated heavy lifting to emphasize AWS’ ease in deploying in enterprises settings.
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"IT Event
In this talk we’ll explore Apache Spark — the most popular cluster computing framework right now. We’ll look at the improvements that Spark brought over Hadoop MapReduce and what makes Spark so fast; explore Spark programming model and RDDs; and look at some sample use cases for Spark and big data in general.
This talk will be interesting for people who have little or no experience with Spark and would like to learn more about it. It will also be interesting to a general engineering audience as we’ll go over the Spark programming model and some engineering tricks that make Spark fast.
Windows Azure Virtual Machines And Virtual NetworksKristof Rennen
Join us for a tour of the features that make up the new
Windows Azure Virtual Machines and Virtual Networks
offerings. Using demonstrations throughout, we will
explain the Virtual Machine storage architecture and
show how to provision and customize virtual machines,
confi gure network connectivity between virtual machines,
and confi gure site-to-site networks that enable true
applications that span from on-premises to Windows
Azure. We’ll focus specifi cally on features that enable you
to create highly available Virtual Machine-based services
and how to connect Virtual Machines with Cloud Services.
Azure Cosmos DB is a globally distributed, massively scalable, multi-model database service. It provides guaranteed low latency at the 99th percentile, elastic scaling of storage and throughput, comprehensive SLAs, and five consistency models. Cosmos DB offers multiple APIs including SQL, MongoDB, Cassandra, Gremlin, and Table to access and query data.
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesAmazon Web Services
This document provides an introduction and best practices for deploying MongoDB on AWS. It describes MongoDB's document model and features like rich queries, geospatial search, and aggregation. New features in MongoDB 3.2 include in-memory storage, encrypted storage, document validation, and dynamic lookups. The document discusses MongoDB high availability using replica sets, elastic scalability through automatic sharding, and query routing. It offers guidance on AWS instance types, EBS volumes, and global deployment architectures. Lastly, it covers management and monitoring tools like Ops Manager, backup strategies, and visual profiling.
This document discusses Azure Backup (Recovery Services) and provides an overview of its key concepts and usage scenarios. Azure Backup allows backing up of on-premises servers and virtual machines to Azure storage. It uses storage vaults mapped to Azure Blob storage to store backup recovery points. The Azure Backup Agent installs on machines to perform backups and restores and manage the backup schedule. Site Recovery allows disaster recovery between on-premises and Azure environments, or between two on-premises sites. References and contacts are provided for further information.
Hortonworks Setup & Configuration on AzureAnita Luthra
The document provides instructions for setting up a Hortonworks 2.5 sandbox on Azure. It discusses:
1. Setting up a free Azure account and selecting a virtual machine and SQL Server.
2. Installing Hortonworks 2.5 on a Linux sandbox virtual machine to explore Hadoop capabilities.
3. Understanding how to manage costs, security, and troubleshoot issues with the setup.
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
This hands-on lab introduces you to Data Server Manager, a Web tool for querying and monitoring your Big SQL database. Data Server Manager (DSM) and Big SQL support select Apache Hadoop platforms.
Azure Virtual Machines Deployment ScenariosBrian Benz
Architecture and Scenarios for deploying Database and middleware applications on Azure Virtual Machines including SQL Server, Oracle, Hadoop, and others.
Overview of Windows Azure Virtual Machines - the IaaS offering in the Windows Azure platform. The presentation covers the compute, storage and network features of Virtual Machines. It also describes how best to deploy Windows Azure cloud services and VMs.
This document discusses Amazon DynamoDB Accelerator (DAX), a fully managed caching service for DynamoDB. DAX provides in-memory caching that can improve the performance of DynamoDB by an order of magnitude by eliminating disk I/O operations and reducing latency from milliseconds to microseconds. DAX is API compatible with DynamoDB, making it easy to add caching without code changes. It provides high availability across Availability Zones and automatic replication of data. DAX can cache entire tables or specific hot keys to improve performance and reduce costs.
This presentation discusses Windows Azure Blob Storage, covering from the Windows Azure Storage Overview, Blob Storage Basic Concept, Blob Storage Advanced, and finally the Tip of the day.
This document provides instructions for a hands-on lab to set up ElastiCache for Redis, Amazon RDS for MySQL, and load Landsat satellite image data into both systems. Key steps include: creating EC2 and database instances; installing MySQL and Redis clients; loading data into the MySQL database; running a Python script to populate Redis from MySQL; and performing sample queries against each database to compare performance.
We believe that security *IS* a shared responsibility, - when we give developers the power to create infrastructure, security became their responsibility, too.
During this meetup, we'd like to share our experience with implementing security best practices, to be implemented directly by development teams to build more robust and secure cloud environments. Make cloud security your team's sport!
Cosmos DB is Microsoft's flagship Serverless database service in the Azure cloud. This slide-deck, presented at the Nashville Azure Meetup event on 09/20/2018 covers the why and what of Cosmos DB was is meant to be a good segue into further detailed and advanced topics. The slide-deck presents 3 use-cases for using Cosmos DB in E-Commerce, Healthcare, and IoT. Stay Tuned!
MS Cloud Day - Building web applications with Azure storageSpiffy
This document provides an overview and agenda for a Microsoft Cloud Day session on building web applications with Azure Storage. The session will cover Blob, Table, and Queue storage capabilities in Azure, including how to create storage accounts, upload and retrieve blobs, create and query tables, and use queues for communication between services. Attendees will learn best practices for scalability when using Azure Storage.
Karan Gulati compiled a list of informative articles and videos about HDInsight to help users get started. The document lists links to introductory articles on HDInsight and its technologies like Hadoop, Hive, and Storm. It also references two Channel9 videos and a book about using Microsoft's HDInsight service on Azure to work with big data and Hadoop. Users are encouraged to add additional useful resources to the list.
The document introduces CouchDB, an open-source document-oriented database. It notes that CouchDB uses a document model with JSON documents and map-reduce functions. It also discusses three main reasons for considering CouchDB - its focus on availability over consistency, ability to scale horizontally using map-reduce, and being well-suited for web applications via its RESTful API.
This document discusses caching services available on Windows Azure, including content delivery networks (CDNs) and caching. It describes how CDNs deliver content closer to end users, and caching stores frequently accessed data closer to Azure applications. Caching on Azure can be done through dedicated roles, co-location with applications, or shared caching services. The document outlines characteristics of CDNs like dedicated endpoints and worldwide datacenters. It also provides examples of caching configuration and workflows in Visual Studio and code samples for putting and getting items from the cache.
This talk will compare the major cloud hosting companies and what products and services they offer. Google's App Engine, Amazon's AWS, Rackspace's Cloud Services and Linode will be compared. We will go beyond a mere checklist of features and dive into such topics as Perl support, cost structures, development strategies, underlying architectures, performance and security.
AWS Webcast - AWS Webinar Series for Education #3 - Discover the Ease of AWS ...Amazon Web Services
This webinar will emphasize how easy it is to deploy AWS resources with access to various publicly available AMIs, SaaS solutions, and CloudFormation templates to get started quickly with AWS. This session will dig deeper into how to launch critical business applications on AWS such as deploy an emergency website, launch SharePoint server and more. The gist of the webinar will be on ease of use and ability to clone environments that largest customers are running while trivializing undifferentiated heavy lifting to emphasize AWS’ ease in deploying in enterprises settings.
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"IT Event
In this talk we’ll explore Apache Spark — the most popular cluster computing framework right now. We’ll look at the improvements that Spark brought over Hadoop MapReduce and what makes Spark so fast; explore Spark programming model and RDDs; and look at some sample use cases for Spark and big data in general.
This talk will be interesting for people who have little or no experience with Spark and would like to learn more about it. It will also be interesting to a general engineering audience as we’ll go over the Spark programming model and some engineering tricks that make Spark fast.
Windows Azure Virtual Machines And Virtual NetworksKristof Rennen
Join us for a tour of the features that make up the new
Windows Azure Virtual Machines and Virtual Networks
offerings. Using demonstrations throughout, we will
explain the Virtual Machine storage architecture and
show how to provision and customize virtual machines,
confi gure network connectivity between virtual machines,
and confi gure site-to-site networks that enable true
applications that span from on-premises to Windows
Azure. We’ll focus specifi cally on features that enable you
to create highly available Virtual Machine-based services
and how to connect Virtual Machines with Cloud Services.
Azure Cosmos DB is a globally distributed, massively scalable, multi-model database service. It provides guaranteed low latency at the 99th percentile, elastic scaling of storage and throughput, comprehensive SLAs, and five consistency models. Cosmos DB offers multiple APIs including SQL, MongoDB, Cassandra, Gremlin, and Table to access and query data.
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesAmazon Web Services
This document provides an introduction and best practices for deploying MongoDB on AWS. It describes MongoDB's document model and features like rich queries, geospatial search, and aggregation. New features in MongoDB 3.2 include in-memory storage, encrypted storage, document validation, and dynamic lookups. The document discusses MongoDB high availability using replica sets, elastic scalability through automatic sharding, and query routing. It offers guidance on AWS instance types, EBS volumes, and global deployment architectures. Lastly, it covers management and monitoring tools like Ops Manager, backup strategies, and visual profiling.
This document discusses Azure Backup (Recovery Services) and provides an overview of its key concepts and usage scenarios. Azure Backup allows backing up of on-premises servers and virtual machines to Azure storage. It uses storage vaults mapped to Azure Blob storage to store backup recovery points. The Azure Backup Agent installs on machines to perform backups and restores and manage the backup schedule. Site Recovery allows disaster recovery between on-premises and Azure environments, or between two on-premises sites. References and contacts are provided for further information.
Hortonworks Setup & Configuration on AzureAnita Luthra
The document provides instructions for setting up a Hortonworks 2.5 sandbox on Azure. It discusses:
1. Setting up a free Azure account and selecting a virtual machine and SQL Server.
2. Installing Hortonworks 2.5 on a Linux sandbox virtual machine to explore Hadoop capabilities.
3. Understanding how to manage costs, security, and troubleshoot issues with the setup.
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
This hands-on lab introduces you to Data Server Manager, a Web tool for querying and monitoring your Big SQL database. Data Server Manager (DSM) and Big SQL support select Apache Hadoop platforms.
This document summarizes an agenda for sessions on DAOS (Domino Attachment and Object Service) and ID Vault. It discusses configuring and best practices for DAOS, including enabling it, using the estimator tool, and location considerations. It also covers configuring ID Vault, including requirements, the setup process, selecting administrators and organizations, and viewing uploaded IDs.
Big Data: Explore Hadoop and BigInsights self-study labCynthia Saracco
Want a quick tour of Apache Hadoop and InfoSphere BigInsights (IBM's Hadoop distribution)? Follow this self-study lab to get hands-on experience with HDFS, MapReduce jobs, BigSheets, Big SQL, and more. This lab was tested against the free BigInsights Quick Start Edition 3.0 VMware image.
Elasticache is a web service that allows users to easily deploy, operate, and scale an in-memory cache in the cloud. It improves web application performance by using fast, managed Redis or Memcached caches instead of relying entirely on slower disk-based databases. Amazon Elasticache offers fully managed Redis and Memcached that can provide sub-millisecond response times for demanding applications. It allows users to seamlessly set up, run, and scale open-source compatible in-memory data stores in the cloud for uses like caching, sessions, gaming, geospatial services, analytics, and queuing.
- The document provides instructions for a hands-on lab to demonstrate big data concepts in Azure.
- It includes steps to create an Azure storage account and load sample data, set up an HDInsight Hadoop cluster, ingest data using Stream Analytics, and visualize data with Power BI.
- The labs will have participants create various Azure services, load and query data, and gain an understanding of how to use common pieces of a big data and analytics solution in Azure.
This document provides an overview of working with Apache HBase through IBM InfoSphere BigInsights. It discusses starting and monitoring the HBase server, creating tables and loading sample data, and exploring different design options for schemas and queries. The hands-on lab demonstrates creating a simple HBase table, inserting rows of data, and modifying column family properties like compression and caching. It presents both conceptual and physical views of how HBase stores data.
MIDAS - Web Based Room & Resource Scheduling Software - LDAP (Active Director...MIDAS
This is the complete User Manual for MIDAS (http://mid.as/), a complete web based scheduling solution giving you complete control over your room bookings & resource scheduling. You can find out more about MIDAS at http://mid.as/
This provides a conversational interface to get recommendations.
Insights: This provides insights into usage, spend and recommendations across all
subscriptions under the management group.
So in summary, Azure Advisor is a free tool that analyses your Azure environment and
provides customized recommendations to optimize resources and follow best practices as per
the Azure well-architected framework.
--Back to Index-- 16
Azure Active Directory
Azure Active Directory (Azure AD) is a cloud-based identity and access management service
that helps employees sign in and access resources. It is Microsoft's multi-tenant, cloud-based
directory and identity management service.
Some key capabilities of Azure AD include:
- User and group management
This document provides guidance on using cloud tools to build data assets like Kafka topics, schemas, and Iceberg tables. It describes how to create each type of data asset step-by-step within the CDP Sandbox environment. The document also lists streaming data topics and examples that are available for use in applications. It suggests options for integrating external public data sources or data simulators. References are provided for additional documentation and code examples.
Drupal 7x Installation - Introduction to Drupal ConceptsMicky Metts
This document provides an overview of a presentation on installing and configuring Drupal 7. It discusses downloading and installing Drupal, creating a database, enabling modules, and navigating the administrative screens. It also recommends modules helpful for administrators and provides resources for learning more about Drupal. The presentation includes labs for attendees to complete hands-on activities like installing modules and enables questions throughout.
SQL Server 2014 includes several new features including in-memory OLTP, natively compiled stored procedures, and enhancements to backup and restore. It also improves integration with Azure and includes new T-SQL enhancements. The speaker is a database administrator providing a high-level overview of the key new capabilities in SQL Server 2014 without going into detailed demonstrations.
It is estimated that 30% of cloud spend is wasted. How do you avoid wasting 30% of your Azure spend? This talk takes you through the approach NewOrbit takes to analyse and optimise Azure costs.
Strategies to automate deployment and provisioning of Microsoft Azure.HARMAN Services
Hear Michael Collier, Principal Cloud Architect at Aditi Technologies talk about the key automation strategies for success in Microsoft Azure, followed by a quick demo of Brewmaster, an automated provisioning and deployment tool for Azure.
Presentation from June 2013, Surrey, BC, Drupal Group meetup.
- Some tips how to improve Drupal 7 performance.
- Get Drupal 7 working faster
- Optimize code in order to get proper responses
- Use cache (memcache, APC cache, entity cache, varnish)
- Scale Drupal horizontally in order to balance load
Microsoft R server for distributed computing โดย กฤษฏิ์ คำตื้อ Technical Evangelist Microsoft (Thailand) Limited ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
A quick overview of how to get started visualizing your Drupal (or PHP in general, or whatever language) code. Helpful for learning complicated systems, finding performance bottlenecks, and feeling cool.
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.
Alluxio Tech Talk
Mar 12, 2019
Speaker:
Bin Fan, Alluxio
Matt Fuller, Starburst
As data analytic needs have increased with the explosion of data, the importance of the speed of analytics and the interactivity of queries has increased dramatically
In this tech talk, we will introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency analytics platform. This stack provides a strong solution to run fast SQL across multiple storage systems including HDFS, S3, and others in public cloud, hybrid cloud, and multi-cloud environments.
You’ll learn about:
- The architecture of Presto, an open source distributed SQL engine, as well as innovations by Starburst like as it’s cost-based optimizer
- How Presto can query data from cloud object storage like S3 at high performance and cost-effectively with Alluxio
- How to achieve data locality and cross-job caching with Alluxio no matter where the data is persisted and reduce egress costs
In addition, we’ll present some real world architectures & use cases from internet companies like JD.com and NetEase.com running the Presto and Alluxio stack at the scale of hundreds of nodes.
Similar to Create HDInsight Cluster in Azure Portal (February 2015) (20)
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
2. This presentation is available via recordings.
Blog: Create HDInsight Cluster in Azure Portal
http://blogs.msdn.com/b/cindygross/archive/2015/02/26/create-
hdinsight-cluster-in-azure-portal.aspx
YouTube Playlist SQLCindy - Getting Started with HDInsight
https://www.youtube.com/playlist?list=PLAD2dOpGM3s1R2L5HgPMX4
MkTGvSza7gv
4. Why Hadoop?
• Scale-out
• Load data now, add schema later (write once, read many)
• Fail fast – iterate through many questions to find the right question
• Faster time from question to insight
• Hadoop is “just another data source” for BI, Analytics, Machine
Learning
5. Why HDInsight?
• HDInsight is Hadoop on Azure as a service
• Easy, cost effective, changeable scale out data processing
• Lower TCO – easily add/remove/scale
• Separation of storage and compute allows data to exist across clusters
6. HDInsight Technology
• Hortonworks HDP is one of the 3 major Hadoop
distributors, the most purely open source
• HDInsight *IS* Hortonworks HDP as a service in Azure (cloud)
• Metastore (Hcatalog) exists independently across clusters via SQL DB
• #, size, type of clusters are flexible and can all access the same data
• Hive is a Hadoop component that makes data look like rows/columns
for data warehouse type activities
7. Why Big Data in the Azure Cloud?
• Instantly access data born in the cloud
• Easily, cheaply load, share, and merge public or private data
• Data exists independently across clusters (separation of storage and
compute) via WASB on Azure storage accounts
9. Get an Azure Subscription
Trial: http://azure.microsoft.com/en-us/pricing/free-trial/
MSDN Subscription: http://azure.microsoft.com/en-
us/pricing/member-offers/msdn-benefits/
Startup BizSpark: http://azure.microsoft.com/en-us/pricing/member-
offers/bizspark-startups/
Classroom: http://www.microsoftazurepass.com/azureu
Pay-As-You-Go or Enterprise Agreement:
http://azure.microsoft.com/en-us/pricing/
10. Login to Azure
Subscription
1. Login on Azure Portal
https://manage.windowsazure.
com
2. Use a Microsoft Account
http://www.microsoft.com/en-
us/account/default.aspx
Note: Some companies have
federated their accounts and
can use company accounts.
11. Choose
Subscription
Most accounts will only have one
Azure subscription associated with
them. But if you seem to have
unexpected resources, check to
make sure you are in the expected
subscription. The Subscriptions
button is on the upper right of the
Azure portal.
12. Add Accounts
Option: Add more
Microsoft Accounts as
admins of the Azure
Subscription.
1. Choose SETTINGS at the
very bottom on the left.
2. Then choose
ADMINISTRATORS at
the top. Click on the
ADD button at the very
bottom.
3. Enter a Microsoft
Account or federated
enterprise account that
will be an admin.
14. Create a
Storage
Account
1. Click on STORAGE in the
left menu then NEW.
2. URL: Choose a storage
account name that is
unique within
*.core.windows.net.
3. LOCATION: Choose the
same location for the
SQL Azure metastore
database, the storage
account(s), and
HDInsight.
4. REPLICATION: Locally
redundant stores fewer
copies and costs less.
Repeat if you need additional
storage.
15. Create a Container
1. Click on your storage account in the left
menu then CONTAINERS on the top.
2. Choose CREATE A CONTAINER or choose
the NEW button at the bottom.
3. Enter a lower-case NAME for the
container, unique within that storage
account.
4. Choose either Private or Public ACCESS.
If there is any chance of sensitive or PII
data being loaded to this container
choose Private. Private access requires a
key. HDInsight can be configured with
that key during creation or keys can be
passed in for individual jobs.
This will be the default container for the
cluster. If you want to manage your data
separately you may want to create additional
containers.
18. Create a Metastore
aka Azure SQL DB
Persist your Hive and Oozie metadata
across cluster instances, even if no
cluster exists, with an HCatalog
metastore in an Azure SQL Database.
This database should not be used for
anything else. While it works to share
a single metastore across multiple
instances it is not officially tested or
supported.
1. Click on SQL DATABASES then NEW
and choose CUSTOM CREATE.
2. Choose a NAME unique to your server.
3. Click on the “?” to help you decide
what TIER of database to create.
4. Use the default database COLLATION.
5. If you choose an existing SERVER you
will share sysadmin access with other
databases.
19. Firewall Rules
In order to refer to the
metastore from automated
cluster creation scripts such as
PowerShell your workstation
must be added to the firewall
rules.
1. Click on MANAGE then choose
YES.
2. You can also use the
MANAGE button to connect
to the SQL Azure database
and manage logins and
permissions.
21. How to Create an HDInsight Cluster
• Quick Create through the Azure portal is the fastest way to get started with
all the default settings.
• The Azure portal Custom Create allows you to customize size, storage, and
other configuration options.
• You can customize and automate through code including .NET and
PowerShell. This increases standardization and lets you automate the
creation and deletion of clusters over time.
• For all the examples here we will create a basic Hadoop cluster with Hive,
Pig, and MapReduce.
• A cluster will take several minutes to create, the type and size of the cluster
have little impact on the time for creation.
23. Option 1:
Quick Create
For your first cluster choose a
Quick Create.
1. Click on HDINSIGHT in the
left menu, then NEW.
2. Choose Hadoop. HBase
and Storm also include the
features of a basic Hadoop
cluster but are optimized
for in-memory key value
pairs (HBase) or alerting
(Storm).
3. Choose a NAME unique in
the azurehdinisght.net
domain.
4. Start with a small CLUSTER
SIZE, often 2 or 4 nodes.
5. Choose the admin
PASSWORD.
6. The location of the
STORAGE ACCOUNT
determines the location of
the cluster.
25. Option 2:
Custom Create
You can also customize your size, admin
account, storage, metastore, and more
through the portal. We’ll walk through a basic
Hadoop cluster.
1. Click on HDINSIGHT in the left menu, then
NEW in the lower left.
2. Choose CUSTOM CREATE.
<continued>
26. Custom Create
Basic Info
1. Choose a NAME unique in the
azurehdinisght.net domain.
2. Choose Hadoop. HBase and Storm
also include the features of a
basic Hadoop cluster but are
optimized for in-memory key-
value pairs (HBase) or alerting
(Storm).
3. Choose Windows or Linux as the
OPERATING SYSTEM. Linux is only
available if you have signed up for
the preview.
4. In most cases you will want the
default VERSION.
<continued>
27. Custom Create
Size and Location
1. Choose the number of DATA NODES for this cluster. Head nodes and gateway nodes will also be created and they all
use HDInsight cores. For information on how many cores are used by each node see the “Pricing details” link.
2. Each subscription has a billing limit set for the maximum number of HDInsight cores available to that subscription.
To change the number available to your subscription choose “Create a support ticket.” If the total of all HDInsight
cores in use plus the number needed for the cluster you are creating exceeds the billing limit you will receive a
message: “This cluster requires X cores, but only Y cores are available for this subscription”. Note that the messages
are in cores and your configuration is specified in nodes.
3. The storage account(s), metastore, and cluster will all be in the same REGION.
<continued>
28. Custom Create
Cluster Admin
1. Choose an administrator USER
NAME. It is more secure to
avoid “admin” and to choose a
relatively obscure name. This
account will be added to the
cluster and doesn’t have to
match any existing external
accounts.
2. Choose a strong PASSWORD of
at least 10 characters with
upper/lower case letters, a
number, and a special character.
Some special characters may
not be accepted.
<continued>
29. Custom Create
Metastore (Hcatalog)
On the same page as the Hadoop
cluster admin account you can
optionally choose to use a common
metastore (Hcatalog).
1. Click on the blue box to the right
of “Enter the Hive/Oozie
Metastore”. This makes more
fields available.
2. Choose the SQL Azure database
you created earlier as the
METASTORE.
3. Enter a login (DATABASE USER)
and PASSWORD that allow you to
access the METASTORE database.
If you encounter errors, try
logging in to the database
manually from the portal. You
may need to open firewall ports
or change permissions.
<continued>
30. Custom Create
Default Storage
Account
Every cluster has a default
storage account. You can
optionally specify additional
storage accounts at cluster
create time or at run time.
1. To access existing data on an
existing STORAGE ACCOUNT,
choose “Use Existing
Storage”.
2. Specify the NAME of the
existing storage account.
3. Choose a DEFAULT
CONTAINER on the default
storage account. Other
containers (units of data
management) can be used
as long as the storage
account is known to the
cluster.
4. To add ADDITIONAL
STORAGE ACCOUNTS that
will be accessible without
the user providing the
storage account key, specify
that here.
<continued>
31. Custom Create
Additional Storage
Accounts
If you specified there will be
additional accounts you will see
this screen.
1. If you choose “Use Existing
Storage” you simply enter
the NAME of the storage
account.
2. If you choose “Use Storage
From Another Subscription”
you specify the NAME and
the GUID KEY for that
storage account.
<continued>
32. Custom Create
Script Actions
You can add additional components
or configure existing components as
the cluster is deployed. This is
beyond the scope of this demo.
1. Click “add script action” to show
the remaining parameters.
2. Enter a unique NAME for your
action.
3. The SCRIPT URI points to code
for your custom action.
4. Choose the NODE TYPE for
deployment.
<continued>
33. Create is Done!
Once you click on the final
checkmark Azure goes to work and
creates the cluster. This takes
several minutes. When the cluster
is ready you can view it in the
portal.
35. Hive Console
The simplest, most relatable way for most people to use
Hadoop is via the SQL-like, Database-like Hive and HiveQL
(HQL).
1. Put focus on your HDInsight cluster and choose QUERY
CONSOLE to open a new tab in your browser. In my case it
opens: https://dragondemo1.azurehdinsight.net//
2. Click on Hive Editor.
36. Query Hive
The query console defaults to selecting the first
10 rows from the pre-loaded sample table. This
table is created when the cluster is created.
1. Optionally edit or replace the default query:
Select * from hivesampletable LIMIT 10;
2. Optionally name your query to make it
easier to find in the job history.
3. Click Submit.
Hive is a batch system optimized for processing
huge amounts of data. It spends several
seconds up front splitting the job across the
nodes and this overhead exists even for small
result sets. If you are doing the equivalent of a
table scan in SQL Server and have enough
nodes in Hadoop, Hadoop will probably be
faster than SQL Server. If your query uses
indexes in SQL Server, then SQL Server will likely
be faster than Hive.
37. View Hive Results
1. Click on the Query you just
submitted in the Job Session.
This opens a new tab.
2. You can see the text of the Job
Query that was submitted. You
can Download it.
3. The first few lines of the Job
Output (query result) are
available. To see the full output
choose Download File.
4. The Job Log has details
including errors if there are any.
5. Additional information about
the job is available in the upper
right.
38. View Hive Data in
Excel Workbook
At this point HDInsight is “just
another data source” for any
application that supports
ODBC.
1. Install the Microsoft Hive
ODBC driver.
2. Define an ODBC data
source pointing to your
HDInsight instance.
3. From DATA choose From
Other Sources and From
Data Connection Wizard.
39. View Hive Data in
PowerPivot
At this point HDInsight is “just
another data source” for any
application that supports ODBC.
1. Install the Microsoft Hive
ODBC driver.
2. Define an ODBC data source
pointing to your HDInsight
instance.
3. Click on POWERPIVOT then
choose Manage. This opens a
new PowerPivot for Excel
window.
4. Choose Get External Data
then Others (OLEDB/ODBC).
Now you can combine the Hive
data with other data inside the
tabular PowerPivot data model.
41. Load Data
In the cloud you don’t have to load
data to Hadoop, you can load data to
an Azure Storage Account. Then you
point your HDInsight or other WASB
compliant Hadoop cluster to the
existing data source. There many
ways to load data, for the demo we’ll
use CloudXplorer.
You use the Accounts button to add
Azure, S3, or other data/storage
accounts you want to manage.
In this example nealhadoop is the
Azure storage account, demo is the
container, and bacon is a “directory”.
The files are bacon1.txt and
bacon2.txt. Any Hive tables would
point to the bacon directory, not to
individual files. Drag and drop files
from Windows Explorer to
CloudXplorer.
Windows Azure Storage Explorers
(2014)
43. Pricing
You are charged for the time the
cluster exists, regardless of how
busy it is. Check the website for the
most recent information.
Due to the separation of storage
and compute you can drop your
cluster when it’s not in use and
easily add it back, pointing to
existing data stores that are still
there, when it’s needed again.
45. Automate with
PowerShell
With PowerShell, .NET,
or the Cross-Platform
cmd line tools you can
specify even more
configuration settings
that aren’t available in
the portal. This includes
node size, a library store,
and changing default
configuration settings
such as Tez and
compression.
Automation allows you
to standardize and with
version control lets you
track your configurations
over time.
47. HDInsight WrapUp
• HDInsight is Hadoop on Azure as a service, specifically Hortonworks HDP on either
Windows or Linux
• Easy, cost effective, changeable scale out data processing for a lower TCO – easily
add/remove/scale
• Separation of storage and compute allows data to exist across clusters via WASB
• Metastore (Hcatalog) exists independently across clusters via SQL DB
• #, size, type of clusters are flexible and can all access the same data
• Instantly access data born in the cloud; Easily, cheaply load, share, and merge public or
private data
• Load data now, add schema later (write once, read many)
• Fail fast – iterate through many questions to find the right question
• Faster time from question to insight
• Hadoop is “just another data source” for BI, Analytics, Machine Learning
You can make the system more secure if you create a custom login on the Azure server. Add that login as a user in the database you just created. Grant it minimal read/write permissions in the database. This is not well documented or tested so the exact permissions needed for this are vague. You may see odd errors if you don’t grant the appropriate permissions.
Use Additional Storage Accounts with HDInsight Hive
http://blogs.msdn.com/b/cindygross/archive/2014/05/05/use-additional-storage-accounts-with-hdinsight-hive.aspx
Using multiple storage accounts lets you manage billing, security, backups, and high availability separately for each account. It also enables cross-subscription access.
Generally you want to manage the storage accounts and load data outside of the cluster(s) existence, so choose “use existing storage”. If you let the cluster creation create the storage you lose control. This enables separation of storage and compute so that multiple clusters can access the same data.
Customize HDInsight clusters using Script Action http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-customize-cluster/
LIMIT is similar to TOP in TSQL.
HQL is most similar to MySQL’s implementation of the ANSI-SQL standard.
Technically Azure doesn’t have directories, but Hadoop interprets a file named with a / as being in a directory structure. CloudXplorer is the only free GUI storage explorer that makes that easy to visualize and configure.
.NET and the Azure Cross-platform (xplat) command line tools are also an option.
Sample PowerShell Script: HDInsight Custom Create
http://blogs.msdn.com/b/cindygross/archive/2013/12/06/sample-powershell-script-hdinsight-custom-create.aspx
If your HDInsight and/or Azure cmdlets don’t match the current documention or return unexpected errors run Web Platform Installer and check for a new version of “Microsoft Azure PowerShell with Microsoft Azure SDK” or “Microsoft Azure PowerShell (standalone)”