SlideShare a Scribd company logo
@DoktorKermit & @regbac 
kmn@rehfeld.dk rba@rehfeld.dk 
#CampusDays
#CampusDays 
Agenda 
Elements in a BIG DATA Project on AZURE 
• Walkthrough of the elements needed 
HDInsight 
• Deploy through Azure Portal 
• Deploy with Powershell and Windows Azure SQL Database 
• Multiple Storage Accounts and Configuration Values 
• Deploy as part of your normal ETL
#CampusDays 
Elements in a BIG DATA Project on 
AZURE
#CampusDays 
Elements in a BIG DATA Project on AZURE 
• AZURE Account 
• Storage Account 
• SQL Server 
• SQL Databases 
• Firewall rules 
• HDInsight Cluster 
• Data 
• Hive Scripts 
• Machine Learning
#CampusDays 
Deployment via AZURE portal
#CampusDays 
Deployment via AZURE portal 
Requirements 
• AZURE Account 
• Either a free trial 
• MSDN Subscription 
• Or paid subscription 
• Create one here - http://azure.microsoft.com/da-dk/pricing/free-trial/
#CampusDays 
Deployment via AZURE portal 
Storage konto 
lowercase
#CampusDays 
Deployment via AZURE portal 
• SQL Server 
• Create either when creating a datebase 
• Or alone without a database
#CampusDays 
Deployment via AZURE portal 
• SQL Databases 
• Easy created only name, server and subscription needed
#CampusDays 
Deployment via AZURE portal 
• Firewall Rules 
• Cluster will not be able to see metastore and cluster creation fails
#CampusDays 
Deployment via AZURE portal 
• HDInsight Cluster 
• Needs a storage account 
• Firewall rules must be set to allow all AZURE Services
#CampusDays 
Deployment via AZURE portal 
• Upload files to Azure 
• Use Azure Explorer 
• Upload files yourself 
• Import job via portal 
• Ship harddrive to Microsoft 
• Demo
#CampusDays 
Deployment via AZURE portal 
• Many steps 
• Easy to make mistakes 
• This will be done over and over again 
• Is there another way to make this easier? 
• YES! 
• Lets have a look at it
#CampusDays 
Let’s automate it – using PowerShell
#CampusDays 
Let’s automate it – using PowerShell 
• Using PowerShell 
• Multiple scripts 
• Configuration
#CampusDays 
Let’s automate it – using PowerShell 
• Why Automate it? 
• Reliability 
• Repeatability 
• Save time 
• Eliminate tiresome work 
• Eliminate manual work 
• Manual work is bound to fail at 
some point
#CampusDays 
Let’s automate it – using PowerShell 
• Configuration 
• Flexible 
• Create and recreate 
• Upload data to Cluster 
• Easy to make changes to project 
• Easy to test
#CampusDays 
Demo
#CampusDays 
Let’s automate it – using PowerShell 
• Load Data to Cluster 
• Configuration 
• Shall we download files 
• Shall we upload files 
• Directories 
• Automate download 
• Unzip files 
• Upload csv 
• Cleanup
#CampusDays 
Demo
#CampusDays 
Let’s automate it – using PowerShell 
• After usage – clean up -> save money 
• Script to cleanup cluster 
• Storage 
• SQL server 
• SQL databases 
This saves money, and we can easily 
recreate the objects needed
#CampusDays 
Demo
#CampusDays 
Let’s automate it – using PowerShell 
• Firewall Rule is required 
• Cluster will not be able to see metastore and cluster creation fails 
• Allow All Azure Services 
• On SQL Server created early 
New-AzureSqlDatabaseServerFirewallRule 
-ServerName Campusdays2014 
-AllowAllAzureServices 
-Verbose
#CampusDays 
Let’s automate it – using PowerShell 
• Remember to Add-AzureAccount to your Powershell session. 
• Otherwise you’ll get an error.
#CampusDays 
HDInsight the SSIS way
#CampusDays 
HDInsight as a part of your ETL 
• Normal ETL on-prem 
• Benefits of the Cloud 
• Staying on-prem
#CampusDays 
Keep the cost down and the flexibility high 
• Supports Hybrid scenarios 
• Run on-prem 
• Create HDInsight cluster 
• Do some cool stuff 
• Destroy the cluster 
• No need for PowerShell knowledge
#CampusDays 
HDinsight SSIS Components 
• Community driven 
• More than 10 SSIS components (Incl. connections) 
• First step for moving to the cloud
#CampusDays 
Hadoop Versioner
#CampusDays 
Demo
#CampusDays 
Questions ?
EVENT SPONSORER 
TRACK SPONSORER 
EXPO SPONSORER

More Related Content

What's hot

Container Management with Amazon ECS
Container Management with Amazon ECSContainer Management with Amazon ECS
Container Management with Amazon ECS
AWS Germany
 
Infrastructure Automation on AWS using a Real-World Customer Example
Infrastructure Automation on AWS using a Real-World Customer ExampleInfrastructure Automation on AWS using a Real-World Customer Example
Infrastructure Automation on AWS using a Real-World Customer Example
API Talent
 
DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop
Sascha Möllering
 
Scaling WordPress - WP on AWS
Scaling WordPress - WP on AWSScaling WordPress - WP on AWS
Scaling WordPress - WP on AWS
stk_jj
 
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps PipelinesAws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Stephen Wilding
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
Sascha Möllering
 
Flynn Bundy - 60 micro-services in 6 months
Flynn Bundy - 60 micro-services in 6 months Flynn Bundy - 60 micro-services in 6 months
Flynn Bundy - 60 micro-services in 6 months
WinOps Conf
 
Best practices deploying Sitecore to Microsoft Azure
Best practices deploying Sitecore to Microsoft AzureBest practices deploying Sitecore to Microsoft Azure
Best practices deploying Sitecore to Microsoft Azure
Thom Puiman
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018
Victor Silva
 
SmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews's journey into microservices
SmartNews's journey into microservices
SmartNews, Inc.
 
Go Serverless with Java and Azure Functions
Go Serverless with Java and Azure FunctionsGo Serverless with Java and Azure Functions
Go Serverless with Java and Azure Functions
CodeOps Technologies LLP
 
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
AWS Germany
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
CodeOps Technologies LLP
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWS
David Mat
 
Ops Works Presentation Desert Code Camp 2014
Ops Works Presentation Desert Code Camp 2014Ops Works Presentation Desert Code Camp 2014
Ops Works Presentation Desert Code Camp 2014
Pankaj Gaddam
 
London .NET Developers Azure Websites
London .NET Developers Azure WebsitesLondon .NET Developers Azure Websites
London .NET Developers Azure Websites
Tom Walker
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
CodeOps Technologies LLP
 
Azure DevOps Multistage YAML Pipelines – Top 10 Features
Azure DevOps Multistage YAML Pipelines – Top 10 FeaturesAzure DevOps Multistage YAML Pipelines – Top 10 Features
Azure DevOps Multistage YAML Pipelines – Top 10 Features
Marc Müller
 
Managing application & instance state on AWS
Managing application & instance state on AWSManaging application & instance state on AWS
Managing application & instance state on AWS
David Mat
 
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Trivadis
 

What's hot (20)

Container Management with Amazon ECS
Container Management with Amazon ECSContainer Management with Amazon ECS
Container Management with Amazon ECS
 
Infrastructure Automation on AWS using a Real-World Customer Example
Infrastructure Automation on AWS using a Real-World Customer ExampleInfrastructure Automation on AWS using a Real-World Customer Example
Infrastructure Automation on AWS using a Real-World Customer Example
 
DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop
 
Scaling WordPress - WP on AWS
Scaling WordPress - WP on AWSScaling WordPress - WP on AWS
Scaling WordPress - WP on AWS
 
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps PipelinesAws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
 
Flynn Bundy - 60 micro-services in 6 months
Flynn Bundy - 60 micro-services in 6 months Flynn Bundy - 60 micro-services in 6 months
Flynn Bundy - 60 micro-services in 6 months
 
Best practices deploying Sitecore to Microsoft Azure
Best practices deploying Sitecore to Microsoft AzureBest practices deploying Sitecore to Microsoft Azure
Best practices deploying Sitecore to Microsoft Azure
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018
 
SmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews's journey into microservices
SmartNews's journey into microservices
 
Go Serverless with Java and Azure Functions
Go Serverless with Java and Azure FunctionsGo Serverless with Java and Azure Functions
Go Serverless with Java and Azure Functions
 
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWS
 
Ops Works Presentation Desert Code Camp 2014
Ops Works Presentation Desert Code Camp 2014Ops Works Presentation Desert Code Camp 2014
Ops Works Presentation Desert Code Camp 2014
 
London .NET Developers Azure Websites
London .NET Developers Azure WebsitesLondon .NET Developers Azure Websites
London .NET Developers Azure Websites
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Azure DevOps Multistage YAML Pipelines – Top 10 Features
Azure DevOps Multistage YAML Pipelines – Top 10 FeaturesAzure DevOps Multistage YAML Pipelines – Top 10 Features
Azure DevOps Multistage YAML Pipelines – Top 10 Features
 
Managing application & instance state on AWS
Managing application & instance state on AWSManaging application & instance state on AWS
Managing application & instance state on AWS
 
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
 

Viewers also liked

Master Data Services - used for than just data
Master Data Services - used for than just dataMaster Data Services - used for than just data
Master Data Services - used for than just data
Kenneth Michael Nielsen
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
Kenneth Michael Nielsen
 
Creating a distinctive brand identity
Creating a distinctive brand identityCreating a distinctive brand identity
Creating a distinctive brand identityJoBoyle94
 
Dr who part 1
Dr who part 1Dr who part 1
Dr who part 1JoBoyle94
 
Evaluation Question 4
Evaluation Question 4Evaluation Question 4
Evaluation Question 4JoBoyle94
 
Dr who part 3
Dr who part 3Dr who part 3
Dr who part 3JoBoyle94
 
Re Integratie 2e Spoor
Re Integratie 2e SpoorRe Integratie 2e Spoor
Re Integratie 2e SpoorSjef van Selst
 
Dr who part 2
Dr who part 2Dr who part 2
Dr who part 2JoBoyle94
 
Listen to the natives power comvídeo
Listen to the natives power comvídeoListen to the natives power comvídeo
Listen to the natives power comvídeo
Marseadg
 
E-Mail Campaign
E-Mail CampaignE-Mail Campaign
E-Mail Campaign
aixwebmktg
 
Furqan Resume (4)
Furqan Resume (4)Furqan Resume (4)
Furqan Resume (4)Furqan Ji
 

Viewers also liked (11)

Master Data Services - used for than just data
Master Data Services - used for than just dataMaster Data Services - used for than just data
Master Data Services - used for than just data
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
Creating a distinctive brand identity
Creating a distinctive brand identityCreating a distinctive brand identity
Creating a distinctive brand identity
 
Dr who part 1
Dr who part 1Dr who part 1
Dr who part 1
 
Evaluation Question 4
Evaluation Question 4Evaluation Question 4
Evaluation Question 4
 
Dr who part 3
Dr who part 3Dr who part 3
Dr who part 3
 
Re Integratie 2e Spoor
Re Integratie 2e SpoorRe Integratie 2e Spoor
Re Integratie 2e Spoor
 
Dr who part 2
Dr who part 2Dr who part 2
Dr who part 2
 
Listen to the natives power comvídeo
Listen to the natives power comvídeoListen to the natives power comvídeo
Listen to the natives power comvídeo
 
E-Mail Campaign
E-Mail CampaignE-Mail Campaign
E-Mail Campaign
 
Furqan Resume (4)
Furqan Resume (4)Furqan Resume (4)
Furqan Resume (4)
 

Similar to Campus days Azure HDInsight automation

Migrare Applicazioni Web su Azure
Migrare Applicazioni Web su AzureMigrare Applicazioni Web su Azure
Migrare Applicazioni Web su Azure
Marco Parenzan
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
Kudu voodoo slideshare
Kudu voodoo   slideshareKudu voodoo   slideshare
Kudu voodoo slideshare
Aidan Casey
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Mark Kromer
 
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
Puppet
 
Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”
Seven Peaks Speaks
 
More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)
Michael Collier
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Deploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azureDeploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azureGlyn Darkin
 
Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
Wesley Workman
 
All Day DevOps - Azure DevOps from Start to Star
All Day DevOps - Azure DevOps from Start to StarAll Day DevOps - Azure DevOps from Start to Star
All Day DevOps - Azure DevOps from Start to Star
Ángel Rayo
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup
 
Going Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS GlueGoing Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS Glue
Michael Rainey
 
AWS Kochi User Group Presentation
AWS  Kochi User Group PresentationAWS  Kochi User Group Presentation
AWS Kochi User Group Presentation
Varun Manik
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Emerson Eduardo Rodrigues Von Staffen
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
Amazon Web Services
 
Azure fundamentals 03
Azure fundamentals 03Azure fundamentals 03
Azure fundamentals 03
Thi Nguyen Dinh
 

Similar to Campus days Azure HDInsight automation (20)

Migrare Applicazioni Web su Azure
Migrare Applicazioni Web su AzureMigrare Applicazioni Web su Azure
Migrare Applicazioni Web su Azure
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
 
Kudu voodoo slideshare
Kudu voodoo   slideshareKudu voodoo   slideshare
Kudu voodoo slideshare
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
 
Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”
 
More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Deploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azureDeploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azure
 
Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
 
All Day DevOps - Azure DevOps from Start to Star
All Day DevOps - Azure DevOps from Start to StarAll Day DevOps - Azure DevOps from Start to Star
All Day DevOps - Azure DevOps from Start to Star
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
 
Going Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS GlueGoing Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS Glue
 
AWS Kochi User Group Presentation
AWS  Kochi User Group PresentationAWS  Kochi User Group Presentation
AWS Kochi User Group Presentation
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Azure fundamentals 03
Azure fundamentals 03Azure fundamentals 03
Azure fundamentals 03
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Campus days Azure HDInsight automation

  • 1. @DoktorKermit & @regbac kmn@rehfeld.dk rba@rehfeld.dk #CampusDays
  • 2. #CampusDays Agenda Elements in a BIG DATA Project on AZURE • Walkthrough of the elements needed HDInsight • Deploy through Azure Portal • Deploy with Powershell and Windows Azure SQL Database • Multiple Storage Accounts and Configuration Values • Deploy as part of your normal ETL
  • 3. #CampusDays Elements in a BIG DATA Project on AZURE
  • 4. #CampusDays Elements in a BIG DATA Project on AZURE • AZURE Account • Storage Account • SQL Server • SQL Databases • Firewall rules • HDInsight Cluster • Data • Hive Scripts • Machine Learning
  • 6. #CampusDays Deployment via AZURE portal Requirements • AZURE Account • Either a free trial • MSDN Subscription • Or paid subscription • Create one here - http://azure.microsoft.com/da-dk/pricing/free-trial/
  • 7. #CampusDays Deployment via AZURE portal Storage konto lowercase
  • 8. #CampusDays Deployment via AZURE portal • SQL Server • Create either when creating a datebase • Or alone without a database
  • 9. #CampusDays Deployment via AZURE portal • SQL Databases • Easy created only name, server and subscription needed
  • 10. #CampusDays Deployment via AZURE portal • Firewall Rules • Cluster will not be able to see metastore and cluster creation fails
  • 11. #CampusDays Deployment via AZURE portal • HDInsight Cluster • Needs a storage account • Firewall rules must be set to allow all AZURE Services
  • 12. #CampusDays Deployment via AZURE portal • Upload files to Azure • Use Azure Explorer • Upload files yourself • Import job via portal • Ship harddrive to Microsoft • Demo
  • 13. #CampusDays Deployment via AZURE portal • Many steps • Easy to make mistakes • This will be done over and over again • Is there another way to make this easier? • YES! • Lets have a look at it
  • 14. #CampusDays Let’s automate it – using PowerShell
  • 15. #CampusDays Let’s automate it – using PowerShell • Using PowerShell • Multiple scripts • Configuration
  • 16. #CampusDays Let’s automate it – using PowerShell • Why Automate it? • Reliability • Repeatability • Save time • Eliminate tiresome work • Eliminate manual work • Manual work is bound to fail at some point
  • 17. #CampusDays Let’s automate it – using PowerShell • Configuration • Flexible • Create and recreate • Upload data to Cluster • Easy to make changes to project • Easy to test
  • 19. #CampusDays Let’s automate it – using PowerShell • Load Data to Cluster • Configuration • Shall we download files • Shall we upload files • Directories • Automate download • Unzip files • Upload csv • Cleanup
  • 21. #CampusDays Let’s automate it – using PowerShell • After usage – clean up -> save money • Script to cleanup cluster • Storage • SQL server • SQL databases This saves money, and we can easily recreate the objects needed
  • 23. #CampusDays Let’s automate it – using PowerShell • Firewall Rule is required • Cluster will not be able to see metastore and cluster creation fails • Allow All Azure Services • On SQL Server created early New-AzureSqlDatabaseServerFirewallRule -ServerName Campusdays2014 -AllowAllAzureServices -Verbose
  • 24. #CampusDays Let’s automate it – using PowerShell • Remember to Add-AzureAccount to your Powershell session. • Otherwise you’ll get an error.
  • 26. #CampusDays HDInsight as a part of your ETL • Normal ETL on-prem • Benefits of the Cloud • Staying on-prem
  • 27. #CampusDays Keep the cost down and the flexibility high • Supports Hybrid scenarios • Run on-prem • Create HDInsight cluster • Do some cool stuff • Destroy the cluster • No need for PowerShell knowledge
  • 28. #CampusDays HDinsight SSIS Components • Community driven • More than 10 SSIS components (Incl. connections) • First step for moving to the cloud
  • 32. EVENT SPONSORER TRACK SPONSORER EXPO SPONSORER

Editor's Notes

  1. Title Slide – Insert session title, session code and speaker names Project this slide while attendees are arriving. Please do not add additional elements to this slide
  2. Section title slide (Optional)
  3. Vi skal idag kigge lidt på hvad indholdet I en BIG DATA løsning på AZURE kan indeholde Der kommer en hurtig gennemgang af elementerne , hvad de indeholder og hvad de skal bruges til Effter denne gennemgang, går vi over til den lidt mere praktiske del, hvor vi kigger på hvordan elementerne kan oprettes Kan vi gøre det på flere forskellige made, og hvilken er den bedste? Hvis sådan en finds. Hvad vil jeg opnå med denne løsnig, Det er målet at stille en stor mængde data til rådighed for en Machine Learning Algoritme som vi skal kigge på senere. Der skal oprettes et helt project på AZURE som understøtter dette. Vi skal downloade data og uploade det til vores Cluster
  4. Section title slide (Optional)
  5. Man skal selvfølgelig have en AZURE account En storage account er her der oprettes sql servers, databaser, containers etc. dvs denne SKAL bruges og være på plads når der arbejdes med data på Azure. SQL server, der skal oprettes en SQL server på azure, som skal hoste alle de databaser der skal benyttes, i dette tilfælde bliver den udelukkende brugt til at gemme META data om HDInsight Clusteret, det oprettes senere Databaserne, det er selvklart her data gemmes, og der kan oprettes N databaser. Firewall Rules, disse skal være på plads for at styrer adgangen til databaser og services på AZURE. Dette er for at tillade trafik og adgang fra de enkelte services til den database der indeholde rmetadata HDInsight Cluster, dette er AZURES Hadoop løsning, der indeholder alt hvad der skal til for at arbejde med BIG DATA i Skyen Det er dette cluster der muligt at opload data til iform af eks. CSV filer, som kan lægges i tabeller, for senere at lave forespørgsler på disse via HOVE scripts Data, det data der skal gemmes i Data containeren skal oploades, dette kan som sagt være alt slags data, tekst, billeder, lyd, blot der er en eller anden form for meta data der kan gøres søgbar via HIVE scripts Hive Scripts – dette er HADOOP eller HDInsights query language, det min der meget om SQL men der er dog visse begrænsninger på dette. Der skal skrives et HIVE script som man derefter commiter på sin HDINsight – hastigheden på forespørgslerne kan virke langsommeligt, men husk at det er data uden indexes, og det vi kan kalde blandet data.
  6. Lad os starte med at se på hvordan vi kan få deployet elementerne via AZURE portalen, det vil sige manuelt.
  7. Det er et krav at man har en AZURE konto, dette kan enten være en gratis prøve version En MSDN konto, eller en betalt konto hvor man har mulighed for at sætte en beløbsgrænse på HUSK at det altid koster penge at have eks. Et HDInsight Cluster stående og være tændt, hvorimod det IKKE koster noget at bruge storage. Med andre ord det er altså CPU tid man betaler for på AZURE. Her er link til oprettelse af en gratis prøve version
  8. Der er behov for at have oprettet en storage account, denne skal benyttes til at oprette Containers med, det er her data bliver gemt ifbm. blobs på HDInsight Clusteret. Denne oprettes ved at klikkepå storage, hvor man hjælpsomt får af vide hvis man endnu ikke har oprettet en, skal dette gøres klikkes selvklart på opret storage account. Det er vigtigt at huske på at en storage accounts navn skal være unikt, da det kommer til at blive benyttet som subdomæne på *.core.windows.net – samt at navnet SKAL skrives med lowercase. Vælg herefter den location der er tættest på den fysiske lokation der skal benytte storage accounten mest. Dvs. er der tale om en dansk løsning, så vil det kunne betale sig at vælge ”North Europe” mens var vi nu i Seattle, så ville jeg vælge North America. Dette alene pga. netforbindelser og afstande. Vælg dernæst om din storage account skal være Georedundant, Local redundant, Zone redundant eller Read Access Geo Redundant
  9. Opret en sql server, det er ikke muligt at oprette denne alene, hvorfor der skal gøres samtidigt med at der oprettes en database. Angiv et databasenavn, vælg den subscription der skal benyttes New SQL database Server Lokation, og igen som med Storage Accounten Vælg herefter den location der er tættest på den fysiske lokation der skal benytte storage accounten mest. Dvs. er der tale om en dansk løsning, så vil det kunne betale sig at vælge ”North Europe” mens var vi nu i Seattle, så ville jeg vælge North America. Dette alene pga. netforbindelser og afstande. Angiv et brugernavn og password, der skal benyttes ifbm administration af serveren
  10. Har du allerede en SQL server oprettet, kan du oprette en database på denne. Giv den et navn, vælg din subscription, og derefter en server hvorpå den skal bo.
  11. Det er vigtigt at oprette en Firewall regel der tillader alle AZURE services at tilgå din netop oprettede server og database – gøres dette ikke er det ikke muligt for servicen at benytte databasen
  12. Opret herefter et Hdinsight cCluster, Angiv et Cluster navn, hvor mange noder der skal benyttes, minimum 2 noder hvis det er produktion. Er det test eller demo kan det være rigeligt med 1 node, dog er der så ikke meget cluster over instansen. Igen skal der huskes at angives et brugernavn og et password til administrationen af clusteret
  13. Ship harddrives to Microsoft, these have to be encrypted with bitlocker
  14. Section title slide (Optional)
  15. Konfigurerbar, det er muligt at lave alle
  16. Hvorfor automatisere arbejdet for at skabe stabilitet, gentagelser, spar tid, Slippe for det kedelige arbejde, og skabe tid til at lave det spændende. Opgaven med at oprette og nedlægge instanser på Azure er trivielle og det skal gå galt på et tidspunkt, da det er manuelt arbejde.
  17. Demo placeholder (Optional)
  18. Demo placeholder (Optional)
  19. Demo placeholder (Optional)
  20. Content slide – (white background)
  21. Section title slide (Optional)
  22. Normal ETL process on-premises, mix with jobs in the cloud.
  23. Extrapolates. As a developer I know SSIS but not Hive or Sqoop….
  24. Demo placeholder (Optional)
  25. Demo placeholder (Optional)