Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

byteLAKE: AI and HPC solutions


Published on

We build AI and HPC solutions. Expertise: highly optimized AI Engines and HPC Apps.

Dear friends, here is a tiny glimpse into some of our projects. It's our journey of bringing AI from beyond the hype into the very real world solutions. There's more than that.

It's a raw story that shaped our products and services. It's about how Machine Learning and Deep Learning move industries beyond the status quo and how their best friends HPC and Edge actually enable that transformation.

And a lot is cooking at byteLAKE now... so expect some exciting announcements very soon...

Presentation will be available only for a limited time at:

Published in: Business
  • Be the first to comment

byteLAKE: AI and HPC solutions

  1. 1. byteLAKE Highly optimized AI Engines and HPC Apps We Build AI & HPC Solutions
  2. 2. Products and Services Cognitive AutomationEdge AI Services HPC Products CFD Suite brainello Ewa Guard Federated Learning Crypto Mining Suite for Alveo FPGA Intelligent Restaurant Incubation
  3. 3. Collaboration More at:
  4. 4. Who are we? CPU/ GPU/ VPU/ NPU MPI CUDA Open CL/ Open MP Tensor Flow Caffe Open CV Cloud & Data Centers Edge Intelligence Hadoop/ Spark/ … + many more Intel Sony Qualcomm Adidas Tieto HERE Siemens Samsung Vertu ImmobilienScout24 Nokia Benq BrightOne byteLAKE’s Research & Engineering Lenovo Xilinx FPGA
  5. 5. Our Team Marcin Rojek Co-founder I am fascinated about how AI innovations reshape industries. Co-founding byteLAKE lets me become a vital part of that transformation & work with the brightest and most creative minds. 15+ years of experience in global sales & engineering. Action man with a hustler’s spirit. Krzysztof Rojek CTO, DSc, PhD I link byteLAKE’s business with research & academia world. I am a huge fan and a promoter of ideas that start in the research space and land in the practical, real-life business applications. DSc degree in Computer Science, internationally awarded in HPC & AI space. Our Amazing Engineering Team Located in Poland. Mariusz Kolanko Co-founder I think AI will let us focus on creative work and leave repetitive stuff to machines. 15+ years of experience in management, operations and engineering. Successfully drove Agile initiatives in multinational environments. Open-minded in searching, focused on solving. John Sedgwick Customer Success, USA AI is the future but we are starting to see and understand some of the possibilities now. AI will grow exponentially across multiple sectors and throughout the economy as technology allows us to deploy smart tactics and solve difficult problems. 15+ years in IoT, Semiconductor, Software and Services. Providing strategy and tactics to find value and deliver products to meet customer and segment needs.
  6. 6. byteLAKE among top AI companies in Poland! "It contains information on practically all meaningful companies operating in Poland which offer services or products in the field of modern technologies. We believe this map will be necessary to help both domestic and international investors looking for interesting projects in Poland.", Aleksander Kutela, President of Digital Poland Foundation
  7. 7. • Algorithms optimization – making the most of hardware components – optimal usage of precious resources for faster results at the lowest possible cost (energy needs reduced) • Detecting shapes & patterns • Advanced data analytics • Solutions for IoT/ edge, Cloud and on-premise configurations Edge AI ➢ highly optimized AI engines to analyze text, image, video, sound and time series data ➢ local AI processing, directly on a device
  8. 8. • Complex tasks automation (thru processing data from e-mails, documents, workflow or IoT systems etc.) • Industry 4.0 automation (eliminating human errors in quality systems thru data analytics and repetitive tasks monitoring and automation) • Enabling data-driven, proactive operations (finding answers hidden in the data) Cognitive Automation bespoke RPAs, supercharged with AI Custom made solutions with the ability to self-improve over time
  9. 9. HPC at byteLAKE Complex algorithms adaptation to architectures powered by Intel CPU/ NPU, Xilinx Alveo FPGA and NVIDIA GPU. Unleashing the power: • selecting the right programming model to a given problem (task parallelism, data parallelism, mixture of these two) • providing the right balance between CPUs and GPUs/FPGAs • optimizing data transfers between host memory and accelerators • code adaptation to a variety of computing platforms Bottom line: lowering TCO thru various optimizations (performance, energy efficiency, accuracy of calculations) More at: Making the most of the hardware: • Speedup: accelerating time to results for AI, CFD and FinTech • Green Computing: optimizing algorithms to reduce energy consumption • Scalability: with many accelerators within a node and clusters having many nodes
  10. 10. Crypto Mining Suite Expertise Software Services Exceptional Alveo experience: CFD Suite Acceleration and TCO Optimization More at: Customized HPC Solutions
  11. 11. • Expertise in all possible configurations – desktop, mobile, server – Tesla, Fermi, Kepler (K80, GeForce GTX Titan, Jetson), Maxwell (NVIDIA GeForce GTX 980), Pascal (P100), Tesla V100, T4 – CUDA, OpenCL, OpenACC • Several case studies delivered – AI training (machine/ deep learning) – Edge AI inferencing – Classic HPC simulations (CFD, weather) • Very active in research space – several publications for prestigious journals (Concurrency and Computation: Practice and Experience, Parallel Computing, Journal of Supercomputing etc.) Exceptional experience with NVIDIA More at:
  12. 12. Faster Results at Lower Cost AI & HPC Convergence AI CFD FinTech Expertise: Xilinx, NVIDIA & Intel
  13. 13. byteLAKE speeds-up AI models training by leveraging Lenovo AI Innovation Centers • Deep Neural Network Reduced 336 hours (2 weeks) to 6 hours 56 x speed-up Layers: 27 Epochs: 10 000 Images: 400 (416x416 px) Accuracy: 92-95% Original HW: 2x Intel Xeon E5-2695 CPU 256 GB of RAM Lenovo HW: SR650 2x Intel Xeon Gold 6148 CPU Nvidia T4 768GB of RAM
  14. 14. Our Research Studies GO PUBLIC! More at:
  15. 15. AI automation and data-driven, proactive operations across industries. Accelerating time to results at lowest costs possible. Learn more at:
  16. 16. Products
  17. 17. CFD Suite Collection of fluid dynamics algorithms, highly optimized for Xilinx Alveo FPGA powered systems. Faster results, lower TCO.
  18. 18. Why FPGAs? 18 • No predefined instruction set or underlying architecture • Developers customize the architecture to their needs – Custom data paths – Custom bit-width – Custom memory hierarchies • Excels at all types of parallelism – Deeply pipelined (e.g. Video codecs) – Bit manipulations (e.g. AES, SHA) – Wide data path (e.g. DNN) – Custom memory hierarchy (e.g. Data analytics) • Adapts to evolving algorithms and workload needs FPGAs - the Ultimate Parallel Processing Device
  19. 19. • Xilinx pioneered C to FPGA compilation technology (aka “HLS”) in 2011 19 Source code in C, C++ or OpenCL loop_main:for(int j=0;j<NUM_SIMGROUPS;j+=2) { loop_share:for(uint k=0;k<NUM_SIMS;k++) { loop_parallel:for(int i=0;i<NUM_RNGS;i++) { mt_rng[i].BOX_MULLER(&num1[i][k],&num2[i][k],ratio4,ratio3); float payoff1 = expf(num1[i][k])-1.0f; float payoff2 = expf(num2[i][k])-1.0f; if(num1[i][k]>0.0f) pCall1[i][k]+= payoff1; else pPut1[i][k]-=payoff1; if(num2[i][k]>0.0f) pCall2[i][k]+=payoff2; else pPut2[i][k]-=payoff2; } } } FPGACompile
  20. 20. General Architecture with FPGAs PCIe x86 CPU Host Application Runtime and Drivers Acceleration API FPGA Accelerated Functions DMA Engine AXI Interfaces byteLAKE’s CFD Suite Xilinx Acceleration Platform C/C++ code with OpenCL API calls C/C++ or OpenCL C FPGACPU 20
  21. 21. • FPGAs are best adapted to applications where: – most computational effort concentrates on a small portion of the code per pipeline which is repeatedly executed for a large dataset – input/output communications load is small when compared to the computational load, to avoid saturating the host memory-FPGA interface. ➢CFD codes fulfill these requirements and, therefore, appear as good candidates to benefit from FPGA-based accelerators. 21 FPGAs – highway for CFD algorithms
  22. 22. Where the acceleration happens? Typical CFD workflow From CAD to MESH… (meshing) …to CFD simulation and visualization. • MESH conversion (input) • byteLAKE’s CFD Suite • Data output for visualization upto5% ofsimulationtime major workload Image source: OPENFOAM® is a registered trademark of ESI Group. This offering is not approved or endorsed by ESI Group, the producer of the OpenFOAM software and owner of the OPENFOAM® and OpenCFD® trademarks.
  23. 23. • Key features – collection of fluid dynamics algorithms – highly optimized for Xilinx Alveo FPGAs – on-premise and in the Cloud – straightforward integration – AI powered • Benefits – Acceleration = faster results – Green Computing = reduced energy – Lower TCO = ultimate cost reduction – Excellent Performance / Watt = reduced operational costs byteLAKE’s CFD Suite
  24. 24. Optimized CFD across industries byteLAKE’s CFD Suite Advection CFD Solvers Thomas Algorithm Green Energy / Weather Automotive Construction Chemistry / Pharma Oil & Gas / Pipe Flow Algorithms Roadmap
  25. 25. • MPDATA (Multidimensional Positive Definite Advection Transport Algorithm) – main part of the dynamic core of the Eulerian/ semi-Lagrangian (EULAG) model – EULAG (MPDATA+elliptic solver) is the established computational model, developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios – currently, this model is being implemented as the new dynamic core of the COSMO (Consortium for Small-scale Modeling) weather prediction framework – advection (together with the elliptic solver) is a key part of many frameworks that allow users to implement their simulations • Advection – movement of some material (dissolved or suspended) in the fluid. Algorithm: Advection (MPDATA) General Information
  26. 26. • Easy to integrate – Can work as a standalone application or be called as a function via our dedicated interface (e.g. can be called as a function with input and output arrays) – Compatible with frameworks like TensorFlow for integrating deep learning with CFD codes • Easy to visualize the results – Results can be stored in a raw format as a binary file of the output arrays or converted via byteLAKE tools to a ParaView format • See benefits already in 1-node HPC configurations – Strongly adapted to Alveo U250, were single card supports the max size of arrays: 2,1 Gcells (max compute domain: 1264 x 1264 x 1264) ~ 60 GB • Scalable to many cards per node and many nodes Algorithm: Advection (MPDATA) Compatibility
  27. 27. • Compute domain divided into 4 sub- domains • Host sends data to the FPGA global memory • Host calls Advection kernel to execute it on FPGA (kernel is called many times) • Each kernel call represents a single time step • FPGA sends the output array back to host Algorithm: Advection (MPDATA) Architecture
  28. 28. • First-order-accurate step of the advection scheme. Second-order is an option. • Input data – Array X – non-diffusive quantity (e.g. temperature of water vapor, ice, precipitation, etc.) – Arrays V1, V2, V3 - each of them stores the velocity vectors in one direction – (optional) Arrays Fi, Fe - implosion and explosion forces acting on a structure of X – (optional) Array D with density – (optional) Array rho which defines an interface for the coupling of COSMO and EULAG dynamic core (used to provide the transformation of the X variable) – DT – time step (scalar) • Output data – single X array that was updated in the given time step Algorithm: Advection (MPDATA) Technical Information
  29. 29. Algorithm: Advection (MPDATA) Benchmark INTEL XEON E5-2995 INTEL XEON E5-2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Performance (the higher the better) INTEL XEON E5-2995 INTEL XEON E5-2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Energy (the lower the better) INTEL XEON E5-2995 INTEL XEON E5-2995 INTEL XEON GOLD 6148 INTEL XEON PLATINUM 8168 XILINX ALVEO U250 Performance/W (the higher the better)
  30. 30. Algorithm: Advection (MPDATA) TCO Simulation Setup CPU [USD] Accelerator [USD] Price [USD] # of devices per node Price * # Intel Xeon Platinum 8168 8000 0 8000 2 16000 Xilinx U250 + Intel Xeon E3 1220 200 5000 5200 1 5200 TCO CPU/FPGA Alveo Acceleration 1 0.325 2 0.65 3 0.975 4 1.3 … … 10 3.25 TCO CPU/FPGA Alveo Acceleration 1 0.64 2 1.28 3 1.91 4 2.55 … … 10 6.38 1-2 5200- 10200 1 Alveo/node 2 Alveos/node
  31. 31. • Applications include – To characterize the sub-grid scales effect in global numerical simulations of turbulent stellar interiors – To compare anelastic and compressible convection-permitting weather forecasts for the Alpine region – Modeling the prediction of forest fire spread – Flood simulations – Biomechanical modeling of brain injuries within the Voigt model (a linear system of differential equations where the motion of the brain tissue depends merely on the balance between viscous and elastic forces) – Simulation gravity wave turbulence in the Earth's atmosphere – Simulation of geophysical turbulence in the Earth's atmosphere – Ocean modeling: simulation of three-dimensional solitary wave generation and propagation using EULAG coupled to the barotropic NCOM (Navy Coastal Ocean Model) tidal model 31 Applications of Advection (MPDATA) byteLAKE’s CFD Suite
  32. 32. • Applications include, cont. – Oil and Gas: provides a significant return on investment (ROI) in seismic analysis, reservoir modelling and basin modelling. Used also to monitor drilling and seismic data to optimize drilling trajectories and minimize environmental risk. – AgriTech: models to track and predict various environmental impacts on crop yield such as weather changes. For example, daily weather predictions can be customized based on the needs of each client and range from hyperlocal to global. • Example adopters – Poznan Supercomputing and Networking Center, Poland: prognosis of air pollution – European Centre for Medium-Range Weather Forecasts, UK: weather forecast – Institute of Meteorology and Water Management, Poland: weather forecasts – German Aerospace Center: aeronautics, transport and energy areas – University of Cape Town, RPA: weather simulation – Montreal University: weather simulation – Warsaw University: ocean simulation Applications of Advection (MPDATA), cont. byteLAKE’s CFD Suite Full list
  33. 33. Ewa Guard AI helping address humanity's greatest challenges like reforestation (forestry management) and fly tipping detection (illegal dumping).
  34. 34. Forestry Management with AI AI analyzing drones footage & reducing manual work in reforestation
  35. 35. Simple to use: 1. Integrate byteLAKE’s Ewa Guard into your system (on-premise, Cloud) 2. Ewa Guard will analyze drones footage and provide analytics including: - number of young trees - location of selected tree types / condition Ewa Guard can be trained to extract more information from images and videos. Let us know what you need!
  36. 36. Video:
  37. 37. Detecting anomalies
  38. 38. Detecting young trees Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree Tree
  39. 39. Mitigating the Impact of Deforestation Did you know that Earth loses ~19 million acres of forests per year, which is equal to 27 soccer fields every minute? Video:
  40. 40. AI engine to accelerate and automate documents processing.
  41. 41. AI engine to accelerate and automate documents processing Tangible benefits and ROI thru: • significantly reduced time of documents processing • eliminated human error • automation Meet Bottom line: brainello saves money and gets back a lot of time to the teams.
  42. 42. ➢RPA, supercharged with AI (no need to prepare templates) ➢Various formats of documents supported ➢Easy integration (standalone engine or integrated into larger system thru many supported interfaces) ➢Self-improving over time (assisted learning as an option) Key features
  43. 43. Integrated with Bpower2 7xFaster Invoices Processing documents workflow More at:
  44. 44. • Documents processing automation: 45 Beyond invoices automation Roadmap – Content Analytics (finding key parts like dates, regulations and setting reminders, enforcing rules etc.) – Document Type Detection (sorting based on content and distributing to appropriate workflows) – Intelligent forms filling (for funds and repetitive forms) FRAMEWORK
  45. 45. Explainer Video Video:
  46. 46. In Collaboration with Microsoft
  47. 47. Federated Learning Bringing collaborative learning with privacy to AI. Enabling machines and AI models to learn from each other across global networks.
  48. 48. Federated Learning enables devices (IoT) to learn from each other
  49. 49. How it works Machine Learning Sensor Data Time Local training & inferencing Intelligent Device Aggregating local models (black box) Federated Learning Data Center
  50. 50. ➢ learning to predict pressure changes (locally) ➢ learning locally & from already trained models (federated learning) Manufacturing Simulation Fan 1 Fan 2 Barometric pressure sensor Filter 1 Filter 2 Styrofoam pellets Styrofoam pellets… … cause filters clogging & air pressure changes.
  51. 51. ✓Enables Scalability (Decentralized AI enables IoT/devices to learn from each other) ✓Solves low-throughput and high-latency challenges (Local AI models provide real time response and lower power consumption) ✓Improves accuracy (Have smarter models via aggregation of many local models) ✓Reduces training time (Benefit from local training and already trained models in the neighborhood) ✓Lowers the cost of training (Bringing data from all devices is expensive) ✓Ensures privacy (Sensitive data stays local) Benefits of Federated Learning White Paper
  52. 52. Edge AI byteLAKE’s Edge AI Starter Kit accelerating development of Computer Vision enabled products.
  53. 53. Traffic Analytics byteLAKE’s Edge AI Starter Kit
  54. 54. ✓ byteLAKE’s Cloud Framework ▪ Communication between Edge device and Microsoft Azure ▪ Visualization of gathered information (reporting, statistics, diagnostics etc.) Starter Kit: On-device AI-powered Video Analytics Real-Time Objects Detection On Edge Dashboard Edge AI Microsoft Azure Video:
  55. 55. Starter Kit: AI for Lenovo’s Smart Edge Devices Models (DNN) Interaction Data Edge AI Solution Instant Response Advanced Analytics
  56. 56. ✓ byteLAKE’s low-level C++ mod (integrates Basler’s cam with NCS SDK) ✓ byteLAKE’s Computer Vision asynchronous model (enables real-time on-device objects classification) ✓ OpenCV code to visualize the results Starter Kit: On-device Objects Recognition for Retail Movidius Neural Compute Stick Raspberry Pi Real time objects classification DNN TensorFlow Pylon API Raspbian Stretch OS NCS SDK OpenCV Pineapple! 97.9% Model Video:
  57. 57. ✓Enables Scalability (Decentralizes AI services & makes it easier to expand the IoT ecosystems) ✓Enables real-time AI experience (By using modern low power, high performance, small form factor accelerators) ✓Solves round-trip latencies (Deploying AI directly on the device enables faster response time) ✓Eliminates intermittent connectivity related issues (No need for sending the data from the device to external AI services and waiting for results) ✓Reduce total cost of ownership (AI-enabled devices pre-process the data and send the results to external services vs raw data) ✓Data can stay locally on the device (Having AI on the device allows for sending the data to external storages selectively) Benefits of Edge AI
  58. 58. More Case Studies
  59. 59. 12x better performance 30% reduced energy consumption • Our solution: machine learning managed, dynamic application of mixed precision • Highlights: – Dynamic estimation of the algorithm’s power consumption as a function of the frequency of the processor and the number of cores. – Energy-aware task management – Auto-tuning procedure taking into account algorithm’s and GPU-specific parameters for auto-configuring purposes. – Result: better performance, less energy consumed. Weather engine optimized for Europe’s fastest supercomputer (Piz Daint) Our mechanism provides the energy savings of up to 1.43x comparing to the default Linux scaling governor. More at:
  60. 60. HPC Simulations optimized by Machine Learning automatic adaptation of algorithm to a specific hardware architecture • Enables software portability between different architectures: – CPU: different number of cores, hierarchy of memory, caches size; – GPU: register file reusing, shared memory utilization, GPU direct support, reduction of global memory transaction; – HPC: selecting the right number of nodes, scalability estimation, overlapping data transfers & communication; – Hybrid: load balancing (i.e. selecting appropriate parts of the algorithm for different devices executing the code with different performance) • Helps build adaptable algorithms: – automatically selecting the size of data blocks, number of threads, number of processes, precision of data (i.e. depending on algorithm or input data characteristics some data can be stored using double, single of half precision format) – selecting the criterion of optimization: performance, energy consumption, accuracy of result, mix (i.e. mix of performance & energy to minimize energy and keep execution time or to optimization performance & keep energy budget) • Can auto-configure the system and provide the most suitable compiler flags byteLAKE’s Software Autotuning
  61. 61. Dynamic Mixed Precision Optimize execution time • Ported geophysical model (EULAG) to a parallel computing supercomputer architecture (Piz Daint) • Used Machine Learning (Random Forest) to optimize various numerical parameters as: data blocks sizes, number of GPU streams, sizes of vector data types Optimize energy efficiency • Developed a mechanism (mixed precision) that allowed for providing a low energy consumption of supercomputers keeping the code performance at the highest possible level • Developed a framework, based on software automatic tuning approach Results ✓ 10 times faster ✓ Then we improved it even more, reaching the speed-up of 1.27 ✓ Energy consumption reduced by 33% ✓ Optimized GPUs usage while keeping the accuracy of computations Highlights: • C++, CUDA, MPI, OpenMP
  62. 62. Dynamic Mixed Precision, cont. We reduced E by 33%, increased performance by the factor of 1.27x using 25% less GPUs. We kept the accuracy of the results at the same level as when using double precision arithmetic.
  63. 63. • Goal: Reducing the energy consumption of the MPDATA algorithm (algorithm for numerical simulation of geophysical fluids flows on micro-to-planetary scales – especially used in a numerical weather prediction). • Hardware: Piz Daint supercomputer (ranked 3-rd at top 500), equipped with the most advanced Pascal-based GPUs: NVIDIA Tesla P100. • Idea: Applying mixed precision arithmetic - set a part of operations to be performed in a single precision (32-bits) and the remaining set to double (64-bits). • Why do we use it? A single simulation of the weather phenomenon needed more than 1013 operations. We suspected that not all of them needs double precision arithmetic to preserve the same simulation accuracy. We believe that the control of the precision and accuracy of numerical results can increase the performance, decrease the energy consumption, and provide highly accurate results. • Solution: We used unsupervised learning to estimate the correlation between the precision of each matrix and their influence on criteria (energy, accuracy of results). During the dynamic and short training stage we evaluated the set of operations that could be performed in a single precision without loss in accuracy of the weather simulation. Research Case: (concluded) Reducing the energy consumption in HPC simulation Results: We reduced energy by 33%, increased performance by the factor of 1.27x using 25% less GPUs, keeping the accuracy of the results at the same level as when using double precision arithmetic.
  64. 64. Many projects with Lenovo
  65. 65. • Challenge: perform analytics on a Time Series Data • byteLAKE approach: designed a concept framework that can analyze time series data from various sensors and become a foundation for predictive maintenance systems. • It relies on two key mechanisms: – Feedback Loop Control (FLC). It selects the machine learning model that fits best. – Environment Recognition (ER). It finds the most critical parameters for a given scenario. 67 Research Case (concluded): Predictive Maintenance for Industry 4.0 More at:
  66. 66. 68 Research Case: Reconfiguring HPC Simulation with AI to optimize performance and energy node count accelerators per node memory alignment streams count buffering types … cpu cores memory policy Ca. 5000 possible configurations This module utilizes among others the supervised learning method with the random forest algorithm. The main functionality of the module is to prune the search space in order to eliminate the worst configurations. We develop a Machine Learning module in order to select the most fitting configuration. In this way we achieve a small set that at 90% contains the best configuration. More at:
  67. 67. Dynamic redistribution of HPC resources during the simulation shrinking or expanding the number of nodes at runtime What’s the goal? • maximization of performance of the entire HPC cluster, instead of a single job. How it works? • all or a specific set of jobs is considered as the subject of optimization, not only a single job. • consequently, a single job within a given set can be executed with lower performance, however, the entire set of jobs will be executed faster or more energy efficiently (depending on selected criterion) Features: • can be integrated with SLURM system (accessible from most computing clusters) • can mange the part of the cluster resources (nodes 0-16) or the specific tasks submitted to the cluster (algorithm A, B, C) • needs to be integrated with the algorithm byteLAKE’s AI powered HPC Scheduler
  68. 68. HPC Scheduler: results 70 byteLAKE’s AI powered HPC Scheduler • run app0 (16 nodes) • shrink app0 to 8 nodes and run app1 (16 nodes) • shrink app1 to 8 nodes and run app2 (8 nodes) Result: • 24 nodes utilized, • 3 applications running using 8 nodes, all running in parallel • Performance: efficiency of 8 nodes greater than 16 as 16 is max. scalability • Each node more utilized Traditional SLURM • run app0 (16 nodes) • wait for (app0) • run app1 (16 nodes) • wait for (app1) • run app2 (16 nodes) • wait for (app2) Result: • 16 nodes utilized • 3 series of execution Setup: 24 nodes available, 3 jobs. Application scalable up to 16 nodes (using more than 16 nodes results in performance decrease)
  69. 69. Appendix
  70. 70. 72 Typical engagement scenario AI Workshops Proof of Concept Vast Knowledge & Experience Thanks to our expertise in AI & HPC, we are able to deliver exceptional solutions fast and efficiently. While working together with our clients, we help them solve complex and data rich problems with uniquely designed and optimum solutions. Solution delivery To better understand the benefits of the concept solution, we prepare a proof of concept demo showcasing select functionalities. When everything is agreed, we deliver a complete solution iteratively, in an Agile fashion. • Products co-development • Technology Integration • White label services
  71. 71. 73 Sprint-based Agile Delivery Backlog Planning Execution Potentially deliverable product Inspect & adapt We deliver projects in Agile fashion • Backlog Building it’s a “what” part which tells us what we are to achieve • Planning it’s a “how” part: tells us how we can achieve the “what” • Execution Building a product in time-boxed iterations (sprints) • Gradual product delivery Each sprint ends with a delivery of a so-called: potentially deliverable product • Inspect & adapt Reviews, lessons learnt, continuous improvement ➢ Cycle repeats until product is finalized ➢ Sprints can be grouped in milestones (I, II, III, …) I VII III IV Team …
  72. 72. 74 byteLAKE’s AI Workshops Execute Understand Prepare Knowledge of well-established AI/HPC solutions, frameworks, algorithms, latest and greatest research developments, … Optimization We help choose the best ones that fit to your business needs Benefit from both: Academic and Business Worlds Vast knowledge in Machine & Deep Learning Massive experience in building HPC Solutions Architecture analysis and audit
  73. 73. Every success starts with a dream. We help bring those dreams to life. 75 Solution delivery (1)Understand (2) Prepare (3)Execute (4) Evaluate
  74. 74. Marcin Rojek, co-founder Marcin Rojek Co-founder I have been fascinated by how digital technologies impact our lives for over two decades. After earning my master's degree in computer science, I worked for various global companies including: Siemens Mobile, BenQ- Siemens, Tieto, Ericsson, ST-Ericsson, Sony, Qualcomm, Samsung and more. I began my career as an engineer and then moved into management positions stamping the change by postgraduate certification in project management. Then I gained international experience while leading various initiatives in global sales & delivery teams. These included: new business development/sales, R&D build-up & leadership in multicultural environments. I successfully delivered software products for Consumer Electronics, Semiconductor and Telecom industries across European, US and Asian markets. Some say that Artificial Intelligence is the new electricity and I fully agree with that statement. It is amazing to see how Machine and Deep Learning reshape almost every industry and it is even more amazing to become a vital part of that transformation. Thus I decided to co-found byteLAKE in 2016.
  75. 75. Mariusz Kolanko, co-founder Mariusz Kolanko Co-founder After completing my master’s degree program in computer science I continued the fascinating engineering journey as a software developer. Later on I shifted focus into the areas of project management and customer operations. I used to work for several global logos like: TRW, Siemens Mobile, Benq-Siemens, Tieto, BrightOne, Nokia, Vertu, Here, Adidas, ImmobilienScout24 and more. My business experience is based on a combination of software engineering efforts mixed with successful Agile/ Scrum/ Kanban implementations while wearing various hats of a coach, advisor, project manager and a scrum master. I worked with multicultural and distributed teams across Europe, delivering products for global markets. My practical knowledge is backed by numerous certificates i.e. Scrum Master and Scaled Agile Framework. In 2016 I decided to combine my passion for technology with natural skills in building efficient teams and became a co-founder of byteLAKE.
  76. 76. • Academic Roots – Assistant Professor at Czestochowa University of Technology, PL – DSc, PhD degree in Computer Science (Parallel Computing, GPGPU, self-adaptable codes) – On track towards gaining a professor degree • Worldwide recognized researcher byteLAKE’s CTO Krzysztof Rojek Continuously contributes with new research results (mainly in the space of mathematical designs and optimizations for heterogeneous computing platforms)
  77. 77. • Vast experience in business projects 2008-2010 Research & Engineering in collaboration with IBM – Background: IBM’s CELL processors became first accelerator for supercomputing architectures. New mathematical models were needed for software to make the most of new hardware. – Task: optimize mathematical algorithms for BLAS (Basic Linear Algebra Subprograms) software package to guarantee maximum utilization of processors’ available performance. – Result: CELLs-based computing capacity utilized in 99% + optimized memory access management. 2012-2015 Numerical algorithm re-design for a weather forecasting institution – Background: A weather forecasting institute (IMGW), together with academic institutions, won a grant for the purpose of renovating their software assets and porting them to the modern hardware. – Task: Re-design geophysical simulation algorithm (MPDATA) for parallel computing architecture and adapt it to the CPU+GPU – based supercomputing environment (PizDaint in that case). – Result: MPDATA algorithms has been completely redesigned, parametrized and ported to the massively parallel, many-CPU/GPU architectures. Overall the software’s performance has been increased by 10 times. byteLAKE’s CTO Krzysztof Rojek, cont.
  78. 78. • Vast experience in business projects 2017+ Optimization of Geophysical Algorithms – Background: byteLAKE & Megware want to build an automate algorithm optimization platform – Task: incorporate AI into boosting performance and decreasing energy consumption to the weather forecasting model. – Result: so far designed a mathematical scheduling model, custom-made random forest AI and dynamic mixed precision arithmetic that boosts performance further by factor of 1.32 (12 times from initial state) and slashes the energy consumption by 34%. 2017 AI-based model to process time-series data – Background: A factory in the US wanted to upgrade their predictive maintenance model utilizing time-series data from various IoT sensors measuring filters’ delta pressure, humidity, dust level etc. – Task: design a concept model that could perform an analytics on time-series data and point to the most influencing parameters for a given phenomenon (i.e. filters clogging). – Result: designed a feedback-loop-control model that incorporated several AI algorithms in order to extrapolate given parameters based on their historic data and mechanical characteristics. First concepts with the statistical simulations are currently in the trials. byteLAKE’s CTO Krzysztof Rojek, cont.
  79. 79. • Research & Engineering key contributions – Algorithms parallelization and adaptation to hybrid CPU-GPU platforms – Machine and Deep Learning in various aspects i.e. created new techniques for self-adapting source code, multi-objective code optimization (energy, performance, accuracy), multimedia files analytics, big data analysis etc. – Adaptive scheduling with online modeling to minimize the energy consumption while keeping the given time constraints. • Many international awards, e.g.: – “Outstanding monograph” about parallel computing, Polish Academy of Sciences – “Excellent Paper” about algorithms for heterogonous architectures, Hong Kong – Grant from the IBM Company for Realization of Workshop, dedicated to programming and usage of advanced multicore Cell/B.E. processors byteLAKE’s CTO Krzysztof Rojek, cont.
  80. 80. • Founded by a group of business and technology professionals in 2016. We worked together in various configurations years before. • Most of us have been schooled by Fortune 500 corporations across EU, USA and Asia. Some followed the academic world paths and earned DSc and PhD degree in computer science, math and AI applications. • Together, we combine business and academia. It lets us innovate in a way that was not possible in the past. • We employ excellent AI & HPC specialists and researchers as well as those with the right attitude and willing to learn fast. byteLAKE Team
  81. 81. Listen Actively We start with a consultancy session to better understand our client’s requirements & assumptions. Jump Start Product Development with us! 1 2 3 4 Suggest We thoroughly analyze the gathered information and prepare a draft offer. Agree We fine tune the offer further and wrap up everything into a binding contract. Deliver Finally, the execution starts. We deliver projects in a fully transparent, Agile (SCRUM-based) fashion. Simplicity, straight forward communication and no headaches!
  82. 82. Thank you! We build AI and HPC solutions.