SlideShare a Scribd company logo
1 of 32
@DoktorKermit & @regbac 
kmn@rehfeld.dk rba@rehfeld.dk 
#CampusDays
#CampusDays 
Agenda 
Elements in a BIG DATA Project on AZURE 
• Walkthrough of the elements needed 
HDInsight 
• Deploy through Azure Portal 
• Deploy with Powershell and Windows Azure SQL Database 
• Multiple Storage Accounts and Configuration Values 
• Deploy as part of your normal ETL
#CampusDays 
Elements in a BIG DATA Project on 
AZURE
#CampusDays 
Elements in a BIG DATA Project on AZURE 
• AZURE Account 
• Storage Account 
• SQL Server 
• SQL Databases 
• Firewall rules 
• HDInsight Cluster 
• Data 
• Hive Scripts 
• Machine Learning
#CampusDays 
Deployment via AZURE portal
#CampusDays 
Deployment via AZURE portal 
Requirements 
• AZURE Account 
• Either a free trial 
• MSDN Subscription 
• Or paid subscription 
• Create one here - http://azure.microsoft.com/da-dk/pricing/free-trial/
#CampusDays 
Deployment via AZURE portal 
Storage konto 
lowercase
#CampusDays 
Deployment via AZURE portal 
• SQL Server 
• Create either when creating a datebase 
• Or alone without a database
#CampusDays 
Deployment via AZURE portal 
• SQL Databases 
• Easy created only name, server and subscription needed
#CampusDays 
Deployment via AZURE portal 
• Firewall Rules 
• Cluster will not be able to see metastore and cluster creation fails
#CampusDays 
Deployment via AZURE portal 
• HDInsight Cluster 
• Needs a storage account 
• Firewall rules must be set to allow all AZURE Services
#CampusDays 
Deployment via AZURE portal 
• Upload files to Azure 
• Use Azure Explorer 
• Upload files yourself 
• Import job via portal 
• Ship harddrive to Microsoft 
• Demo
#CampusDays 
Deployment via AZURE portal 
• Many steps 
• Easy to make mistakes 
• This will be done over and over again 
• Is there another way to make this easier? 
• YES! 
• Lets have a look at it
#CampusDays 
Let’s automate it – using PowerShell
#CampusDays 
Let’s automate it – using PowerShell 
• Using PowerShell 
• Multiple scripts 
• Configuration
#CampusDays 
Let’s automate it – using PowerShell 
• Why Automate it? 
• Reliability 
• Repeatability 
• Save time 
• Eliminate tiresome work 
• Eliminate manual work 
• Manual work is bound to fail at 
some point
#CampusDays 
Let’s automate it – using PowerShell 
• Configuration 
• Flexible 
• Create and recreate 
• Upload data to Cluster 
• Easy to make changes to project 
• Easy to test
#CampusDays 
Demo
#CampusDays 
Let’s automate it – using PowerShell 
• Load Data to Cluster 
• Configuration 
• Shall we download files 
• Shall we upload files 
• Directories 
• Automate download 
• Unzip files 
• Upload csv 
• Cleanup
#CampusDays 
Demo
#CampusDays 
Let’s automate it – using PowerShell 
• After usage – clean up -> save money 
• Script to cleanup cluster 
• Storage 
• SQL server 
• SQL databases 
This saves money, and we can easily 
recreate the objects needed
#CampusDays 
Demo
#CampusDays 
Let’s automate it – using PowerShell 
• Firewall Rule is required 
• Cluster will not be able to see metastore and cluster creation fails 
• Allow All Azure Services 
• On SQL Server created early 
New-AzureSqlDatabaseServerFirewallRule 
-ServerName Campusdays2014 
-AllowAllAzureServices 
-Verbose
#CampusDays 
Let’s automate it – using PowerShell 
• Remember to Add-AzureAccount to your Powershell session. 
• Otherwise you’ll get an error.
#CampusDays 
HDInsight the SSIS way
#CampusDays 
HDInsight as a part of your ETL 
• Normal ETL on-prem 
• Benefits of the Cloud 
• Staying on-prem
#CampusDays 
Keep the cost down and the flexibility high 
• Supports Hybrid scenarios 
• Run on-prem 
• Create HDInsight cluster 
• Do some cool stuff 
• Destroy the cluster 
• No need for PowerShell knowledge
#CampusDays 
HDinsight SSIS Components 
• Community driven 
• More than 10 SSIS components (Incl. connections) 
• First step for moving to the cloud
#CampusDays 
Hadoop Versioner
#CampusDays 
Demo
#CampusDays 
Questions ?
EVENT SPONSORER 
TRACK SPONSORER 
EXPO SPONSORER

More Related Content

What's hot

What's hot (20)

Container Management with Amazon ECS
Container Management with Amazon ECSContainer Management with Amazon ECS
Container Management with Amazon ECS
 
Infrastructure Automation on AWS using a Real-World Customer Example
Infrastructure Automation on AWS using a Real-World Customer ExampleInfrastructure Automation on AWS using a Real-World Customer Example
Infrastructure Automation on AWS using a Real-World Customer Example
 
DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop
 
Scaling WordPress - WP on AWS
Scaling WordPress - WP on AWSScaling WordPress - WP on AWS
Scaling WordPress - WP on AWS
 
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps PipelinesAws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
 
Flynn Bundy - 60 micro-services in 6 months
Flynn Bundy - 60 micro-services in 6 months Flynn Bundy - 60 micro-services in 6 months
Flynn Bundy - 60 micro-services in 6 months
 
Best practices deploying Sitecore to Microsoft Azure
Best practices deploying Sitecore to Microsoft AzureBest practices deploying Sitecore to Microsoft Azure
Best practices deploying Sitecore to Microsoft Azure
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018
 
SmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews's journey into microservices
SmartNews's journey into microservices
 
Go Serverless with Java and Azure Functions
Go Serverless with Java and Azure FunctionsGo Serverless with Java and Azure Functions
Go Serverless with Java and Azure Functions
 
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWS
 
Ops Works Presentation Desert Code Camp 2014
Ops Works Presentation Desert Code Camp 2014Ops Works Presentation Desert Code Camp 2014
Ops Works Presentation Desert Code Camp 2014
 
London .NET Developers Azure Websites
London .NET Developers Azure WebsitesLondon .NET Developers Azure Websites
London .NET Developers Azure Websites
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Azure DevOps Multistage YAML Pipelines – Top 10 Features
Azure DevOps Multistage YAML Pipelines – Top 10 FeaturesAzure DevOps Multistage YAML Pipelines – Top 10 Features
Azure DevOps Multistage YAML Pipelines – Top 10 Features
 
Managing application & instance state on AWS
Managing application & instance state on AWSManaging application & instance state on AWS
Managing application & instance state on AWS
 
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
Azure Days 2019: Infrastructure as Code auf Azure (Jonas Wanninger & Daniel H...
 

Viewers also liked (11)

Master Data Services - used for than just data
Master Data Services - used for than just dataMaster Data Services - used for than just data
Master Data Services - used for than just data
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
Creating a distinctive brand identity
Creating a distinctive brand identityCreating a distinctive brand identity
Creating a distinctive brand identity
 
Dr who part 1
Dr who part 1Dr who part 1
Dr who part 1
 
Evaluation Question 4
Evaluation Question 4Evaluation Question 4
Evaluation Question 4
 
Dr who part 3
Dr who part 3Dr who part 3
Dr who part 3
 
Re Integratie 2e Spoor
Re Integratie 2e SpoorRe Integratie 2e Spoor
Re Integratie 2e Spoor
 
Dr who part 2
Dr who part 2Dr who part 2
Dr who part 2
 
Listen to the natives power comvídeo
Listen to the natives power comvídeoListen to the natives power comvídeo
Listen to the natives power comvídeo
 
E-Mail Campaign
E-Mail CampaignE-Mail Campaign
E-Mail Campaign
 
Furqan Resume (4)
Furqan Resume (4)Furqan Resume (4)
Furqan Resume (4)
 

Similar to Campus days Azure HDInsight automation

Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
Deploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azureDeploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azure
Glyn Darkin
 

Similar to Campus days Azure HDInsight automation (20)

Migrare Applicazioni Web su Azure
Migrare Applicazioni Web su AzureMigrare Applicazioni Web su Azure
Migrare Applicazioni Web su Azure
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
 
Kudu voodoo slideshare
Kudu voodoo   slideshareKudu voodoo   slideshare
Kudu voodoo slideshare
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
PuppetConf 2017: Unlocking Azure with Puppet Enterprise- Keiran Sweet, Source...
 
Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”Continuously deploy a containerized app to “Azure App Service”
Continuously deploy a containerized app to “Azure App Service”
 
More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Deploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azureDeploying asp.net and mvc applications to azure
Deploying asp.net and mvc applications to azure
 
Ohio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCPOhio Devfest - Visual Analysis with GCP
Ohio Devfest - Visual Analysis with GCP
 
All Day DevOps - Azure DevOps from Start to Star
All Day DevOps - Azure DevOps from Start to StarAll Day DevOps - Azure DevOps from Start to Star
All Day DevOps - Azure DevOps from Start to Star
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
 
Going Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS GlueGoing Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS Glue
 
AWS Kochi User Group Presentation
AWS  Kochi User Group PresentationAWS  Kochi User Group Presentation
AWS Kochi User Group Presentation
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
 
Azure fundamentals 03
Azure fundamentals 03Azure fundamentals 03
Azure fundamentals 03
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Campus days Azure HDInsight automation

  • 1. @DoktorKermit & @regbac kmn@rehfeld.dk rba@rehfeld.dk #CampusDays
  • 2. #CampusDays Agenda Elements in a BIG DATA Project on AZURE • Walkthrough of the elements needed HDInsight • Deploy through Azure Portal • Deploy with Powershell and Windows Azure SQL Database • Multiple Storage Accounts and Configuration Values • Deploy as part of your normal ETL
  • 3. #CampusDays Elements in a BIG DATA Project on AZURE
  • 4. #CampusDays Elements in a BIG DATA Project on AZURE • AZURE Account • Storage Account • SQL Server • SQL Databases • Firewall rules • HDInsight Cluster • Data • Hive Scripts • Machine Learning
  • 6. #CampusDays Deployment via AZURE portal Requirements • AZURE Account • Either a free trial • MSDN Subscription • Or paid subscription • Create one here - http://azure.microsoft.com/da-dk/pricing/free-trial/
  • 7. #CampusDays Deployment via AZURE portal Storage konto lowercase
  • 8. #CampusDays Deployment via AZURE portal • SQL Server • Create either when creating a datebase • Or alone without a database
  • 9. #CampusDays Deployment via AZURE portal • SQL Databases • Easy created only name, server and subscription needed
  • 10. #CampusDays Deployment via AZURE portal • Firewall Rules • Cluster will not be able to see metastore and cluster creation fails
  • 11. #CampusDays Deployment via AZURE portal • HDInsight Cluster • Needs a storage account • Firewall rules must be set to allow all AZURE Services
  • 12. #CampusDays Deployment via AZURE portal • Upload files to Azure • Use Azure Explorer • Upload files yourself • Import job via portal • Ship harddrive to Microsoft • Demo
  • 13. #CampusDays Deployment via AZURE portal • Many steps • Easy to make mistakes • This will be done over and over again • Is there another way to make this easier? • YES! • Lets have a look at it
  • 14. #CampusDays Let’s automate it – using PowerShell
  • 15. #CampusDays Let’s automate it – using PowerShell • Using PowerShell • Multiple scripts • Configuration
  • 16. #CampusDays Let’s automate it – using PowerShell • Why Automate it? • Reliability • Repeatability • Save time • Eliminate tiresome work • Eliminate manual work • Manual work is bound to fail at some point
  • 17. #CampusDays Let’s automate it – using PowerShell • Configuration • Flexible • Create and recreate • Upload data to Cluster • Easy to make changes to project • Easy to test
  • 19. #CampusDays Let’s automate it – using PowerShell • Load Data to Cluster • Configuration • Shall we download files • Shall we upload files • Directories • Automate download • Unzip files • Upload csv • Cleanup
  • 21. #CampusDays Let’s automate it – using PowerShell • After usage – clean up -> save money • Script to cleanup cluster • Storage • SQL server • SQL databases This saves money, and we can easily recreate the objects needed
  • 23. #CampusDays Let’s automate it – using PowerShell • Firewall Rule is required • Cluster will not be able to see metastore and cluster creation fails • Allow All Azure Services • On SQL Server created early New-AzureSqlDatabaseServerFirewallRule -ServerName Campusdays2014 -AllowAllAzureServices -Verbose
  • 24. #CampusDays Let’s automate it – using PowerShell • Remember to Add-AzureAccount to your Powershell session. • Otherwise you’ll get an error.
  • 26. #CampusDays HDInsight as a part of your ETL • Normal ETL on-prem • Benefits of the Cloud • Staying on-prem
  • 27. #CampusDays Keep the cost down and the flexibility high • Supports Hybrid scenarios • Run on-prem • Create HDInsight cluster • Do some cool stuff • Destroy the cluster • No need for PowerShell knowledge
  • 28. #CampusDays HDinsight SSIS Components • Community driven • More than 10 SSIS components (Incl. connections) • First step for moving to the cloud
  • 32. EVENT SPONSORER TRACK SPONSORER EXPO SPONSORER

Editor's Notes

  1. Title Slide – Insert session title, session code and speaker names Project this slide while attendees are arriving. Please do not add additional elements to this slide
  2. Section title slide (Optional)
  3. Vi skal idag kigge lidt på hvad indholdet I en BIG DATA løsning på AZURE kan indeholde Der kommer en hurtig gennemgang af elementerne , hvad de indeholder og hvad de skal bruges til Effter denne gennemgang, går vi over til den lidt mere praktiske del, hvor vi kigger på hvordan elementerne kan oprettes Kan vi gøre det på flere forskellige made, og hvilken er den bedste? Hvis sådan en finds. Hvad vil jeg opnå med denne løsnig, Det er målet at stille en stor mængde data til rådighed for en Machine Learning Algoritme som vi skal kigge på senere. Der skal oprettes et helt project på AZURE som understøtter dette. Vi skal downloade data og uploade det til vores Cluster
  4. Section title slide (Optional)
  5. Man skal selvfølgelig have en AZURE account En storage account er her der oprettes sql servers, databaser, containers etc. dvs denne SKAL bruges og være på plads når der arbejdes med data på Azure. SQL server, der skal oprettes en SQL server på azure, som skal hoste alle de databaser der skal benyttes, i dette tilfælde bliver den udelukkende brugt til at gemme META data om HDInsight Clusteret, det oprettes senere Databaserne, det er selvklart her data gemmes, og der kan oprettes N databaser. Firewall Rules, disse skal være på plads for at styrer adgangen til databaser og services på AZURE. Dette er for at tillade trafik og adgang fra de enkelte services til den database der indeholde rmetadata HDInsight Cluster, dette er AZURES Hadoop løsning, der indeholder alt hvad der skal til for at arbejde med BIG DATA i Skyen Det er dette cluster der muligt at opload data til iform af eks. CSV filer, som kan lægges i tabeller, for senere at lave forespørgsler på disse via HOVE scripts Data, det data der skal gemmes i Data containeren skal oploades, dette kan som sagt være alt slags data, tekst, billeder, lyd, blot der er en eller anden form for meta data der kan gøres søgbar via HIVE scripts Hive Scripts – dette er HADOOP eller HDInsights query language, det min der meget om SQL men der er dog visse begrænsninger på dette. Der skal skrives et HIVE script som man derefter commiter på sin HDINsight – hastigheden på forespørgslerne kan virke langsommeligt, men husk at det er data uden indexes, og det vi kan kalde blandet data.
  6. Lad os starte med at se på hvordan vi kan få deployet elementerne via AZURE portalen, det vil sige manuelt.
  7. Det er et krav at man har en AZURE konto, dette kan enten være en gratis prøve version En MSDN konto, eller en betalt konto hvor man har mulighed for at sætte en beløbsgrænse på HUSK at det altid koster penge at have eks. Et HDInsight Cluster stående og være tændt, hvorimod det IKKE koster noget at bruge storage. Med andre ord det er altså CPU tid man betaler for på AZURE. Her er link til oprettelse af en gratis prøve version
  8. Der er behov for at have oprettet en storage account, denne skal benyttes til at oprette Containers med, det er her data bliver gemt ifbm. blobs på HDInsight Clusteret. Denne oprettes ved at klikkepå storage, hvor man hjælpsomt får af vide hvis man endnu ikke har oprettet en, skal dette gøres klikkes selvklart på opret storage account. Det er vigtigt at huske på at en storage accounts navn skal være unikt, da det kommer til at blive benyttet som subdomæne på *.core.windows.net – samt at navnet SKAL skrives med lowercase. Vælg herefter den location der er tættest på den fysiske lokation der skal benytte storage accounten mest. Dvs. er der tale om en dansk løsning, så vil det kunne betale sig at vælge ”North Europe” mens var vi nu i Seattle, så ville jeg vælge North America. Dette alene pga. netforbindelser og afstande. Vælg dernæst om din storage account skal være Georedundant, Local redundant, Zone redundant eller Read Access Geo Redundant
  9. Opret en sql server, det er ikke muligt at oprette denne alene, hvorfor der skal gøres samtidigt med at der oprettes en database. Angiv et databasenavn, vælg den subscription der skal benyttes New SQL database Server Lokation, og igen som med Storage Accounten Vælg herefter den location der er tættest på den fysiske lokation der skal benytte storage accounten mest. Dvs. er der tale om en dansk løsning, så vil det kunne betale sig at vælge ”North Europe” mens var vi nu i Seattle, så ville jeg vælge North America. Dette alene pga. netforbindelser og afstande. Angiv et brugernavn og password, der skal benyttes ifbm administration af serveren
  10. Har du allerede en SQL server oprettet, kan du oprette en database på denne. Giv den et navn, vælg din subscription, og derefter en server hvorpå den skal bo.
  11. Det er vigtigt at oprette en Firewall regel der tillader alle AZURE services at tilgå din netop oprettede server og database – gøres dette ikke er det ikke muligt for servicen at benytte databasen
  12. Opret herefter et Hdinsight cCluster, Angiv et Cluster navn, hvor mange noder der skal benyttes, minimum 2 noder hvis det er produktion. Er det test eller demo kan det være rigeligt med 1 node, dog er der så ikke meget cluster over instansen. Igen skal der huskes at angives et brugernavn og et password til administrationen af clusteret
  13. Ship harddrives to Microsoft, these have to be encrypted with bitlocker
  14. Section title slide (Optional)
  15. Konfigurerbar, det er muligt at lave alle
  16. Hvorfor automatisere arbejdet for at skabe stabilitet, gentagelser, spar tid, Slippe for det kedelige arbejde, og skabe tid til at lave det spændende. Opgaven med at oprette og nedlægge instanser på Azure er trivielle og det skal gå galt på et tidspunkt, da det er manuelt arbejde.
  17. Demo placeholder (Optional)
  18. Demo placeholder (Optional)
  19. Demo placeholder (Optional)
  20. Content slide – (white background)
  21. Section title slide (Optional)
  22. Normal ETL process on-premises, mix with jobs in the cloud.
  23. Extrapolates. As a developer I know SSIS but not Hive or Sqoop….
  24. Demo placeholder (Optional)
  25. Demo placeholder (Optional)