Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

LINQ to HPC: Developing Big Data Applications on Windows HPC Server

445 views

Published on

Big data is a rapidly growing customer need. The HPC team is enabling commodity clusters running Windows HPC Server to address the “unstructured” part of the big data workload using the Dryad distributed runtime. We show demos of Dryad and Windows HPC Server,discuss how Dryad and Microsoft SQL Server can be combined in end-to-end solutions that handle both structured and unstructured data,and discuss how to administer Windows HPC Server clusters running Dryad applications.

Published in: Software
  • Login to see the comments

  • Be the first to like this

LINQ to HPC: Developing Big Data Applications on Windows HPC Server

  1. 1. Large Data Volume  100s of TBs to 10s of PBs  Large scale processing and analytics at unprecedented low cost (hardware and software) New Economics  Distributed Parallel Processing Frameworks  Easy to Scale on commodity hardware  MapReduce-style programming models New Technologies  Unstructured  Weak relational schema  Text, Images, Videos, Logs Non-Traditional data Types  Sensors  Devices  Traditional applications  Web Servers  Public data New Data Sources  How popular is my product?  What is the best ad to serve?  Is this a fraudulent transaction? New Questions & New Insights 4
  2. 2. 5
  3. 3. 6
  4. 4. var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"sen") select access; var accesses = from access in user group access by access.page into pages select new UserPageCount(“sen", pages.Key, pages.Count()); var htmAccesses = from access in accesses where access.page.EndsWith(".htm") orderby access.count descending select access; LINQ query transformed into computation graph Input Compute Compute and resort Compute and resort Output 2 1 3 4 5
  5. 5. Processing vertices Edges (files) Inputs Outputs
  6. 6. Processing vertices Edges (files) Inputs Outputs Free Compute Resources
  7. 7. Application that calls LINQ to HPC APIs HPC Head Node DSC Submit LINQ to HPC Job 1 1 The LINQ to HPC job also starts a set of parametric sweep tasks across the rest of the nodes as DVH 2b A LINQ to HPC job starts 1 basic task assigning a node as the DGM 2a 2a LINQ to HPC Vertices read and write files 3b Graph Manager starts/stops Vertices 3a HPC Compute Nodes 3a 3b 2b Graph Manager Vertex Host
  8. 8. Vertices read and write files 3b Graph Manager starts/stops Dryad Vertices 3a HPC Compute Nodes 3a 3b Graph Manager Vertex Host Vertices in logical computation graph • Graph manager starts vertices on Vertex Hosts • Preferentially schedules vertices near input files When input is already on cluster, can make local IO the common case
  9. 9. Application that calls LINQ to HPC APIs HPC Head Node DSC Publish to share: 1. binaries for LINQ to HPC job 2. XML description of LINQ to HPC graph 1 1 DVH loads binaries for this LINQ to HPC job from share, executes them according to commands from DGM DGM reads XML description of graph from share, calls DSC to locate files referenced in XML 2a 3b 3a HPC Compute Nodes 3a 3b 2b LINQ to HPC Graph Manager LINQ to HPC Vertex Host The LINQ to HPC job also starts a set of parametric sweep tasks across the rest of the nodes as DVH 2b A LINQ to HPC job starts 1 basic task assigning a node as the DGM 2a
  10. 10. DSC NODE ADD sen-cn1 /TEMPPATH:c:DryadHpcTemp /DATAPATH:c:DryadHpcData /SERVICE:sen-hn
  11. 11. using System; using System.Linq; using Microsoft.Hpc.Linq; namespace MyProgram { class Program { static void Main(string[] args) { var config = new HpcLinqConfiguration(“MyHpcClusterHeadNode”); var context = new HpcLinqContext(config); var lengths = context.FromDsc<LineRecord>("MyTextData") .Select(r => r.Line.Length); Console.WriteLine("The maximum line length is {0}", lengths.Max()); } } }
  12. 12. HPC provisioning, management, etc. MPI SOA LINQ to HPC runtime Windows Server Azure* Distributed runtimes Cluster and cloud services Platform DSC (Distributed Storage Catalog) Bind individual NTFS shares together to support the LINQ to HPC distributed runtime Programming models LINQ to HPC NEW * Future support planned
  13. 13. Microsoft Big Data End-to-End Sensors Devices Apps Bots Crawlers Data Marts SSAS ERP CRM LOB HPC Server SQL EDW S S RS Data & Compute Intensive HPC App Interactive Reports Performance Scorecard PowerPivot Embedded BI Apps Hadoop Integration Services Integration Services
  14. 14. microsoft.com/learning/en/us/exam.aspx?ID=70-690
  15. 15. www.microsoft.com/teched www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn http://northamerica.msteched.com

×