At Netweb we believe that innovation is a critical business need. As data analytics, high-performance computing and artificial intelligence continue to evolve, we are building solutions and to help you keep pace with the constantly evolving landscape.
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
You can read our blog post about it here: https://getindata.com/blog/how-to-build-continuously-processing-for-24-7-real-time-data-streaming-platform/
Hot to build continuously processing for 24/7 real-time data streaming platform?
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
In this session, Luciano will be walking you through a real use case pipeline that uses Elyra features to help analyze COVID-19 related datasets. He will introduce Elyra, a project built to extend JupyterLab with AI-centric capabilities. He'll showcase the extensions that allow you to build Notebook Pipelines and execute these in a Kubeflow environment, execute notebooks as batch jobs, the ability to create, edit and execute Python scripts directly from JupyterLab
In this video from the OpenFabrics Workshop, Todd Rimmer from Intel presents: Omni-Path Status, Upstreaming and Ongoing Work.
"Intel Omni-Path was first released in early 2016. Omni-Path host and management software is all open sourced. This session will provide an overview of Omni-Path including some of the technical capabilities and performance results as well as some recent industry results. The session will also highlight some of the areas of change and challenges encountered when adding Omni-Path into Open Fabrics and how they have been addressed as well as ongoing work in order to support Omni-Path within the existing Open Fabrics architecture."
Watch the video presentation: http://wp.me/p3RLHQ-gA0
Learn more: http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html
and
https://www.openfabrics.org/index.php/abstracts-agenda.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
At Netweb we believe that innovation is a critical business need. As data analytics, high-performance computing and artificial intelligence continue to evolve, we are building solutions and to help you keep pace with the constantly evolving landscape.
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
You can read our blog post about it here: https://getindata.com/blog/how-to-build-continuously-processing-for-24-7-real-time-data-streaming-platform/
Hot to build continuously processing for 24/7 real-time data streaming platform?
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
In this session, Luciano will be walking you through a real use case pipeline that uses Elyra features to help analyze COVID-19 related datasets. He will introduce Elyra, a project built to extend JupyterLab with AI-centric capabilities. He'll showcase the extensions that allow you to build Notebook Pipelines and execute these in a Kubeflow environment, execute notebooks as batch jobs, the ability to create, edit and execute Python scripts directly from JupyterLab
In this video from the OpenFabrics Workshop, Todd Rimmer from Intel presents: Omni-Path Status, Upstreaming and Ongoing Work.
"Intel Omni-Path was first released in early 2016. Omni-Path host and management software is all open sourced. This session will provide an overview of Omni-Path including some of the technical capabilities and performance results as well as some recent industry results. The session will also highlight some of the areas of change and challenges encountered when adding Omni-Path into Open Fabrics and how they have been addressed as well as ongoing work in order to support Omni-Path within the existing Open Fabrics architecture."
Watch the video presentation: http://wp.me/p3RLHQ-gA0
Learn more: http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html
and
https://www.openfabrics.org/index.php/abstracts-agenda.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Journey Through Four Stages of Kubernetes Deployment MaturityAltoros
In this webinar we will discuss a crawl, walk, run approach to continuous delivery (CD) for applications, point by point:
Where to start, how to advance, and how to reach the level of maximum automation.
How to orchestrate CI/CD processes along with routing and business continuity.
When the automation level is sufficient.
GitOps principles and their benefits.
What tools should be used to automate CI, CD, GitOps, Container Registry, Secrets management, etc
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
What's a machine learning workflow? What open source tools can you use to automate ML workflow?
Reproducible ML pipelines in research and production with monitoring insights from live inference clusters could enable and accelerate the delivery of AI solutions for enterprises. There is a growing ecosystem of tools that augment researchers and machine learning engineers in their day to day operations.
Still, there are big gaps in the machine learning workflow when it comes to training dataset versioning, training performance and metadata tracking, integration testing, inferencing quality monitoring, bias detection, concept drift detection and other aspects that prevent the adoption of AI in organizations of all sizes.
Code Hosting: The Key to Autonomous, Self-Service DevelopmentRachel Maxwell
While developers differ in their selection of tools and coding language of choice, many face the same challenges that are hindering productivity. Often times, developers have to use multiple systems to manage their source code and project artifacts. Couple that with the highly distributed nature of today’s work environment, and it’s no wonder why development teams lack visibility and a holistic view of the entire software development lifecycle. This can be problematic, especially in light of increasingly shorter turnaround times for bringing products to market.
Code hosting and collaboration platforms, on the other hand, solve those challenges. We will talk about why these platforms are needed for today’s developers and how they create a consolidated environment that allows developers to be autonomous, and as a result, more productive.
Specifically, we will delve into the following benefits of code hosting platforms:
• Self-Service: Rather than waiting on IT, project managers can fulfill their own requests through fine-grained permissions and delegated user management with the LDAP/AD system.
• Developer Flexibility: Modern platforms are now accommodating multi-repos and repository types (e.g., Git, Subversion, Maven, etc.) in one project. This is allowing development teams to exploit all of their development resources while maintaining a single source of truth.
• Automation: Code hosting platforms automate processes (e.g., build notifications, repo creation) to the greatest extent possible, enabling developers to focus on developing the actual software.
• Seamless Collaboration: Code hosting platforms streamline code reviews with merge request code reviews and flexible developer workflows. What’s more, users can set permission on feature branch workflows so approvals are required to merge changes.
• Continuous Integration and Continuous Delivery: Rather than isolating development from downstream processes, code hosting platforms synchronize with the continuous integration server and development tool chain (e.g., Jira, Slack, Jenkins) for continuous delivery.
BYOP: Custom Processor Development with Apache NiFiDataWorks Summit
Apache NiFi, a robust, scalable, and secure tool for data flow management, ships with over 212 processors to ingest, route, manipulate, and exfil data from a variety of sources and consumers. But many users turn to NiFi to meet unusual requirements — from proprietary protocol parsing, to running inside connected cars, to offloading massive hardware metrics from oil rigs in the most remote environments. Rather than posting a community request for custom development or offloading unusual demands to unnecessary external systems, there’s an answer in NiFi. Learn how NiFi allows you to quickly prototype custom processors in the scripting language of your choice against live production data without affecting your existing flows. Easily translate prototypes to full-fledged processors to optimize performance and leverage the full provenance reporting infrastructure. Discover how the framework provides conventions to streamline your development and minimize common boilerplate code, and the robust testing framework to make testing easy, and dare we say, fun.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The intended audience will have experience with programming in Groovy, Ruby, Jython, ECMAScript/Javascript, or Lua.
Takeaways: Attendees will gain an understanding in writing custom processors for Apache NiFi, including the component lifecycle, unit and integration testing, quick prototyping using a scripting language of their choice, and the artifact publishing and deployment process.
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
A presentation about the strong competition between open-source vendors and public cloud providers in the Big Data landscape.
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
The world of big data involves an ever-changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the big data ecosystem together; it enables users to "run any data processing pipeline anywhere."
This talk will briefly cover the capabilities of the Beam model for data processing and discuss its architecture, including the portability model. We’ll focus on the present state of the community and the current status of the Beam ecosystem. We’ll cover the state of the art in data processing and discuss where Beam is going next, including completion of the portability framework and the Streaming SQL. Finally, we’ll discuss areas of improvement and how anybody can join us on the path of creating the glue that interconnects the big data ecosystem.
Speaker
Davor Bonaci, Apache Software Foundation; Simbly, V.P. of Apache Beam; Founder/CEO at Operiant
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
Storage Spaces Direct will provide new unseen possibilities for Microsoft Hypervisor Hyper-V. These are on one hand a high performant, high available Scale-Out Fileserver with the possibility to use internal not shared disks like SATA HDDs and SSDs and even NVMe Devices. On the other hand, you can build a Hyper-converged Hyper-V Cluster where the VMs and their Storage are running on the same Servers. And let’s not forget Azure Stack! The first version of Microsoft Private/Hosted Cloud solution will only be supported on the hyper-converged S2D infrastructure. Join this session to learn about this great new technology that will have its role in the future Private and Hosted Cloud infrastructure implementations.
Perforce Helix Never Dies: DevOps at Bandai Namco StudiosPerforce
Traditionally at Bandai Namco Studios, there has been no unified version control system in place and teams could choose to use any VCS system for their game titles—Subversion, Git, AlienBrain, or none at all. I’ll talk about why Bandai Namco Studios chose to standardize on Perforce Helix, show how we develop LiveOps-type mobile applications using the Unity game engine, and the advantages we gain from centrally managing code and assets in Helix.
The Libre-SOC Project aims to create an entirely Libre-Licensed, transparently-developed fully auditable Hybrid 3D CPU-GPU-VPU, using the Supercomputer-class OpenPOWER ISA as the foundation.
Our first test ASIC is a 180nm "Fixed-Point" Power ISA v3.0B processor, 5.1mm x 5.9mm, as a proof-of-concept for the team, whose primary expertise is in Software Engineering. Software Engineering training brings a radically different approach to Hardware development: extensive unit tests, source code revision control, automated development tools are normal. Libre Project Management brings even more: bug trackers, mailing lists, auditable IRC logs and a wiki are standard fare for Libre Projects that are simply not normal Industry-Standard practice.
This talk therefore goes through the workflow, from the original HDL through to the GDS-II layout, showing how we were able to keep track of the development that led to the IMEC 180nm tape-out in July 2021. In particular, by following a parallel development process involving "Real" and "Symbolic" Cell Libraries, developed by Chips4Makers, will be shown how our developers did not need to sign a Foundry NDA, but were still able to work side-by-side with a University that did. With this parallel development process, the University upheld their NDA obligations, and Libre-SOC were simultaneously able to honour its Transparency Objectives.
Modern software development is increasingly taking a “microservice” approach that has resulted in an explosion of complexity at the network level. We have more applications running distributed across different datacenters. Distributed tracing, events, and metrics are essential for observing and understanding modern microservice architectures.
This talk is a deep dive on how to monitor your distributed system. You will get tools, methodologies, and experiences that will help you to realize what your applications expose and how to get value out from all these information.
Gianluca Arbezzano, SRE at InfluxData will share how to monitor a distributed system, how to switch from a more traditional monitoring approach to observability. Stay focused on the server’s role and not on the hostname because it’s not really important anymore, our servers or containers are fast moving part and it’s easy to detach it from the right in case of trouble than call the server by name as a cute puppet. How to design a SLO for your core services and now to iterate on them. Instrument your services with tracing using tools like Zipkin or Jaeger to measure latency between in your network.
Agenda:
• Brief overview of Spark provided spark-shell, spark-submit
• Overview of Spark ContextOverview of Zeppelin and Jupyter notebooks for Spark
• Introduction to IBM Spark Kernel
• Introduction to Cloudera Livy and Spark JobServer
Github Link:
Previous meetups:-
1) Introduction to Resilient Distributed Dataset and deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-and-resilient-distributed-dataset-basics-and-deep-dive
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/225159947/
Video: https://www.youtube.com/watch?v=MkeRWyF1y_0
Github: https://github.com/SatyaNarayan1/spark_meetup
2) Introduction to Spark DataFrames/SQL and Deep dive
Slides: http://www.slideshare.net/sachinparmarss/deep-dive-spark-data-frames-sql-and-catalyst-optimizer
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/226419828/
Video: https://www.youtube.com/watch?v=h71MNWRv99M
Github: https://github.com/parmarsachin/spark-dataframe-demo
3) Apache Spark - Introduction to Spark Streaming and Deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-to-spark-streaming-and-deep-dive-57671774
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/227008581/
Video:
Github: https://github.com/agsachin/spark-meetup
Looking forward to have a great interactive session. Do provide feedback.
Devops Columbia October 2020 - Gabriel Alix: A Discussion on TerraformDrew Malone
Wonder why you would want to use Terraform vs it competitors? Why not stick with CFNs, you ask? CDK should do the trick right? Come enjoy an opinionated take on using Terraform, for the betterment of your sanity. Also, includes a light intro to Terraform for those who are new to it.
Gabriel is a Cloud Technologist and accomplished Cyber practitioner who has led & built complex workloads across the IC for 20+ years. He's a native New Yorker from Washington Heights, with a boisterous laugh and calm demeanor. Gabriel has built a strong career starting in Federal service and has evolved into CTO and now VP of IC at Applied Insight. In addition to his technical accolades, he's a social leader that believes in building and growing strong teams
How and Why GraalVM is quickly becoming relevant for developers (ACEs@home - ...Lucas Jellema
Starting a Java application as fast as any executable with a memory footprint rivaling the most lightweight runtime engines is quickly becoming a reality, through Graal VM and ahead of time compilation. This in turn is a major boost for using Java for microservice and serverless scenarios. The second major pillar of GraalVM is its polyglot capability: it can run code in several languages - JVM and non-JVM such as JavaScript/ES, Python, Ruby, R or even your own DSL. More importantly: GraalVM enables code running in one language to interoperate with code in another language. GraalVM supports many and increasingly more forms of interoperability. This session introduces GraalVM, its main capabilities and its practical applicability - now and in the near future. There are demonstrations of ahead of time compilation and runtime interoperability of various non-JVM languages with Java.
Bringing complex event processing to Spark streamingDataWorks Summit
Complex event processing (CEP) is about identifying business opportunities and threats in real time by detecting patterns in data and taking appropriate automated action. Example business use cases for CEP include location-based marketing, smart inventories, targeted ads, Wi-Fi offloading, fraud detection, churn prediction, fleet management, predictive maintenance, security incident event management, and many more. While Spark Streaming provides a distributed resilient framework for ingesting events in real time, effort is still needed to build CEP applications. This is because CEP use cases require correlation of events, which in turn requires us to treat every incoming event as a discrete occurrence in time. Spark Streaming treats the entire batch of events as single occurrence. Many CEP use cases also require alerts to be fired even when there is no incoming event. An example of such use case is to fire an alert when an order-shipped event is NOT received within the SLA times following an order-received event. At Oracle we have adopted a few neat techniques like running continuous query engines as long running tasks, using empty batches as triggers, etc. to bring complex event processing to Spark Streaming.
Join us to learn more on CEP for Spark, the fastest growing data processing platform in the world.
Speakers
Prabhu Thukkaram, Senior Director, Product Development, Oracle
Hoyong Park, Architect, Oracle
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelinesconfluent
ETL can be painful with dirty data and outdated batch processes slowing you down; there has to be a better way. In this talk we’ll discuss the benefits of introducing a streaming platform to your architecture including how it can greatly simplify complexity, speed up performance, and help your team deliver the features they need with real-time data integration.
Pandora’s Lawrence Weikum will discuss what they’ve done to bring real-time data integration to the team. We’ll review their Kafka-powered data pipelines and how they make the most of Kafka’s Connect API to make it surprisingly system to keep systems in sync.
Presented by:
Lawrence Weikum, Senior Software Engineer, Pandora
Gehrig Kunz, Technical Product Marketing Manager, Confluent
Journey Through Four Stages of Kubernetes Deployment MaturityAltoros
In this webinar we will discuss a crawl, walk, run approach to continuous delivery (CD) for applications, point by point:
Where to start, how to advance, and how to reach the level of maximum automation.
How to orchestrate CI/CD processes along with routing and business continuity.
When the automation level is sufficient.
GitOps principles and their benefits.
What tools should be used to automate CI, CD, GitOps, Container Registry, Secrets management, etc
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
What's a machine learning workflow? What open source tools can you use to automate ML workflow?
Reproducible ML pipelines in research and production with monitoring insights from live inference clusters could enable and accelerate the delivery of AI solutions for enterprises. There is a growing ecosystem of tools that augment researchers and machine learning engineers in their day to day operations.
Still, there are big gaps in the machine learning workflow when it comes to training dataset versioning, training performance and metadata tracking, integration testing, inferencing quality monitoring, bias detection, concept drift detection and other aspects that prevent the adoption of AI in organizations of all sizes.
Code Hosting: The Key to Autonomous, Self-Service DevelopmentRachel Maxwell
While developers differ in their selection of tools and coding language of choice, many face the same challenges that are hindering productivity. Often times, developers have to use multiple systems to manage their source code and project artifacts. Couple that with the highly distributed nature of today’s work environment, and it’s no wonder why development teams lack visibility and a holistic view of the entire software development lifecycle. This can be problematic, especially in light of increasingly shorter turnaround times for bringing products to market.
Code hosting and collaboration platforms, on the other hand, solve those challenges. We will talk about why these platforms are needed for today’s developers and how they create a consolidated environment that allows developers to be autonomous, and as a result, more productive.
Specifically, we will delve into the following benefits of code hosting platforms:
• Self-Service: Rather than waiting on IT, project managers can fulfill their own requests through fine-grained permissions and delegated user management with the LDAP/AD system.
• Developer Flexibility: Modern platforms are now accommodating multi-repos and repository types (e.g., Git, Subversion, Maven, etc.) in one project. This is allowing development teams to exploit all of their development resources while maintaining a single source of truth.
• Automation: Code hosting platforms automate processes (e.g., build notifications, repo creation) to the greatest extent possible, enabling developers to focus on developing the actual software.
• Seamless Collaboration: Code hosting platforms streamline code reviews with merge request code reviews and flexible developer workflows. What’s more, users can set permission on feature branch workflows so approvals are required to merge changes.
• Continuous Integration and Continuous Delivery: Rather than isolating development from downstream processes, code hosting platforms synchronize with the continuous integration server and development tool chain (e.g., Jira, Slack, Jenkins) for continuous delivery.
BYOP: Custom Processor Development with Apache NiFiDataWorks Summit
Apache NiFi, a robust, scalable, and secure tool for data flow management, ships with over 212 processors to ingest, route, manipulate, and exfil data from a variety of sources and consumers. But many users turn to NiFi to meet unusual requirements — from proprietary protocol parsing, to running inside connected cars, to offloading massive hardware metrics from oil rigs in the most remote environments. Rather than posting a community request for custom development or offloading unusual demands to unnecessary external systems, there’s an answer in NiFi. Learn how NiFi allows you to quickly prototype custom processors in the scripting language of your choice against live production data without affecting your existing flows. Easily translate prototypes to full-fledged processors to optimize performance and leverage the full provenance reporting infrastructure. Discover how the framework provides conventions to streamline your development and minimize common boilerplate code, and the robust testing framework to make testing easy, and dare we say, fun.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The intended audience will have experience with programming in Groovy, Ruby, Jython, ECMAScript/Javascript, or Lua.
Takeaways: Attendees will gain an understanding in writing custom processors for Apache NiFi, including the component lifecycle, unit and integration testing, quick prototyping using a scripting language of their choice, and the artifact publishing and deployment process.
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
A presentation about the strong competition between open-source vendors and public cloud providers in the Big Data landscape.
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
The world of big data involves an ever-changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the big data ecosystem together; it enables users to "run any data processing pipeline anywhere."
This talk will briefly cover the capabilities of the Beam model for data processing and discuss its architecture, including the portability model. We’ll focus on the present state of the community and the current status of the Beam ecosystem. We’ll cover the state of the art in data processing and discuss where Beam is going next, including completion of the portability framework and the Streaming SQL. Finally, we’ll discuss areas of improvement and how anybody can join us on the path of creating the glue that interconnects the big data ecosystem.
Speaker
Davor Bonaci, Apache Software Foundation; Simbly, V.P. of Apache Beam; Founder/CEO at Operiant
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
Storage Spaces Direct will provide new unseen possibilities for Microsoft Hypervisor Hyper-V. These are on one hand a high performant, high available Scale-Out Fileserver with the possibility to use internal not shared disks like SATA HDDs and SSDs and even NVMe Devices. On the other hand, you can build a Hyper-converged Hyper-V Cluster where the VMs and their Storage are running on the same Servers. And let’s not forget Azure Stack! The first version of Microsoft Private/Hosted Cloud solution will only be supported on the hyper-converged S2D infrastructure. Join this session to learn about this great new technology that will have its role in the future Private and Hosted Cloud infrastructure implementations.
Perforce Helix Never Dies: DevOps at Bandai Namco StudiosPerforce
Traditionally at Bandai Namco Studios, there has been no unified version control system in place and teams could choose to use any VCS system for their game titles—Subversion, Git, AlienBrain, or none at all. I’ll talk about why Bandai Namco Studios chose to standardize on Perforce Helix, show how we develop LiveOps-type mobile applications using the Unity game engine, and the advantages we gain from centrally managing code and assets in Helix.
The Libre-SOC Project aims to create an entirely Libre-Licensed, transparently-developed fully auditable Hybrid 3D CPU-GPU-VPU, using the Supercomputer-class OpenPOWER ISA as the foundation.
Our first test ASIC is a 180nm "Fixed-Point" Power ISA v3.0B processor, 5.1mm x 5.9mm, as a proof-of-concept for the team, whose primary expertise is in Software Engineering. Software Engineering training brings a radically different approach to Hardware development: extensive unit tests, source code revision control, automated development tools are normal. Libre Project Management brings even more: bug trackers, mailing lists, auditable IRC logs and a wiki are standard fare for Libre Projects that are simply not normal Industry-Standard practice.
This talk therefore goes through the workflow, from the original HDL through to the GDS-II layout, showing how we were able to keep track of the development that led to the IMEC 180nm tape-out in July 2021. In particular, by following a parallel development process involving "Real" and "Symbolic" Cell Libraries, developed by Chips4Makers, will be shown how our developers did not need to sign a Foundry NDA, but were still able to work side-by-side with a University that did. With this parallel development process, the University upheld their NDA obligations, and Libre-SOC were simultaneously able to honour its Transparency Objectives.
Modern software development is increasingly taking a “microservice” approach that has resulted in an explosion of complexity at the network level. We have more applications running distributed across different datacenters. Distributed tracing, events, and metrics are essential for observing and understanding modern microservice architectures.
This talk is a deep dive on how to monitor your distributed system. You will get tools, methodologies, and experiences that will help you to realize what your applications expose and how to get value out from all these information.
Gianluca Arbezzano, SRE at InfluxData will share how to monitor a distributed system, how to switch from a more traditional monitoring approach to observability. Stay focused on the server’s role and not on the hostname because it’s not really important anymore, our servers or containers are fast moving part and it’s easy to detach it from the right in case of trouble than call the server by name as a cute puppet. How to design a SLO for your core services and now to iterate on them. Instrument your services with tracing using tools like Zipkin or Jaeger to measure latency between in your network.
Agenda:
• Brief overview of Spark provided spark-shell, spark-submit
• Overview of Spark ContextOverview of Zeppelin and Jupyter notebooks for Spark
• Introduction to IBM Spark Kernel
• Introduction to Cloudera Livy and Spark JobServer
Github Link:
Previous meetups:-
1) Introduction to Resilient Distributed Dataset and deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-and-resilient-distributed-dataset-basics-and-deep-dive
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/225159947/
Video: https://www.youtube.com/watch?v=MkeRWyF1y_0
Github: https://github.com/SatyaNarayan1/spark_meetup
2) Introduction to Spark DataFrames/SQL and Deep dive
Slides: http://www.slideshare.net/sachinparmarss/deep-dive-spark-data-frames-sql-and-catalyst-optimizer
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/226419828/
Video: https://www.youtube.com/watch?v=h71MNWRv99M
Github: https://github.com/parmarsachin/spark-dataframe-demo
3) Apache Spark - Introduction to Spark Streaming and Deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-to-spark-streaming-and-deep-dive-57671774
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/227008581/
Video:
Github: https://github.com/agsachin/spark-meetup
Looking forward to have a great interactive session. Do provide feedback.
Devops Columbia October 2020 - Gabriel Alix: A Discussion on TerraformDrew Malone
Wonder why you would want to use Terraform vs it competitors? Why not stick with CFNs, you ask? CDK should do the trick right? Come enjoy an opinionated take on using Terraform, for the betterment of your sanity. Also, includes a light intro to Terraform for those who are new to it.
Gabriel is a Cloud Technologist and accomplished Cyber practitioner who has led & built complex workloads across the IC for 20+ years. He's a native New Yorker from Washington Heights, with a boisterous laugh and calm demeanor. Gabriel has built a strong career starting in Federal service and has evolved into CTO and now VP of IC at Applied Insight. In addition to his technical accolades, he's a social leader that believes in building and growing strong teams
How and Why GraalVM is quickly becoming relevant for developers (ACEs@home - ...Lucas Jellema
Starting a Java application as fast as any executable with a memory footprint rivaling the most lightweight runtime engines is quickly becoming a reality, through Graal VM and ahead of time compilation. This in turn is a major boost for using Java for microservice and serverless scenarios. The second major pillar of GraalVM is its polyglot capability: it can run code in several languages - JVM and non-JVM such as JavaScript/ES, Python, Ruby, R or even your own DSL. More importantly: GraalVM enables code running in one language to interoperate with code in another language. GraalVM supports many and increasingly more forms of interoperability. This session introduces GraalVM, its main capabilities and its practical applicability - now and in the near future. There are demonstrations of ahead of time compilation and runtime interoperability of various non-JVM languages with Java.
Bringing complex event processing to Spark streamingDataWorks Summit
Complex event processing (CEP) is about identifying business opportunities and threats in real time by detecting patterns in data and taking appropriate automated action. Example business use cases for CEP include location-based marketing, smart inventories, targeted ads, Wi-Fi offloading, fraud detection, churn prediction, fleet management, predictive maintenance, security incident event management, and many more. While Spark Streaming provides a distributed resilient framework for ingesting events in real time, effort is still needed to build CEP applications. This is because CEP use cases require correlation of events, which in turn requires us to treat every incoming event as a discrete occurrence in time. Spark Streaming treats the entire batch of events as single occurrence. Many CEP use cases also require alerts to be fired even when there is no incoming event. An example of such use case is to fire an alert when an order-shipped event is NOT received within the SLA times following an order-received event. At Oracle we have adopted a few neat techniques like running continuous query engines as long running tasks, using empty batches as triggers, etc. to bring complex event processing to Spark Streaming.
Join us to learn more on CEP for Spark, the fastest growing data processing platform in the world.
Speakers
Prabhu Thukkaram, Senior Director, Product Development, Oracle
Hoyong Park, Architect, Oracle
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelinesconfluent
ETL can be painful with dirty data and outdated batch processes slowing you down; there has to be a better way. In this talk we’ll discuss the benefits of introducing a streaming platform to your architecture including how it can greatly simplify complexity, speed up performance, and help your team deliver the features they need with real-time data integration.
Pandora’s Lawrence Weikum will discuss what they’ve done to bring real-time data integration to the team. We’ll review their Kafka-powered data pipelines and how they make the most of Kafka’s Connect API to make it surprisingly system to keep systems in sync.
Presented by:
Lawrence Weikum, Senior Software Engineer, Pandora
Gehrig Kunz, Technical Product Marketing Manager, Confluent
Latest (storage IO) patterns for cloud-native applications OpenEBS
Applying micro service patterns to storage giving each workload its own Container Attached Storage (CAS) system. This puts the DevOps persona within full control of the storage requirements and brings data agility to k8s persistent workloads. We will go over the concept and the implementation of CAS, as well as its orchestration.
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone Systems
Modern workloads are incredibly diverse and so are architectures. No single architecture is best for every workload. Maximizing performance takes a mix of architectures deployed in CPU, GPU, FPGA, and other future accelerators. Intel® oneAPI products deliver the tools needed to deploy applications and solutions across SVMS architectures. Learn about oneAPI and how they can be used in multiple domains including HPC, IoT, Data Science, and AI.
Introduction to HPC & Supercomputing in AITyrone Systems
Catch up with our live webinar on Natural Language Processing! Learn about how it works and how it applies to you. We have provided all the information in our video recording you would not miss out on.
Watch the Natural Language Processing webinar here!
ScicomP 2015 presentation discussing best practices for debugging CUDA and OpenACC applications with a case study on our collaboration with LLNL to bring debugging to the OpenPOWER stack and OMPT.
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
Presenter: John Andleman, Staff Database Engineer, Citrix
In this session, John will share some interesting use cases leveraging the HPCC Systems platform, including those beyond traditional big data uses. John will also share his roadmap of HPCC projects being planned for the next few months and why he feels HPCC Systems is a more suitable solution than Hadoop based on experiences and lessons learned.
NOTE: The video of this presentation is the 3rd one shown in the accompanying YouTube link.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
2. !2
The ‘Meta’ Issue
What is driving all of this?
Scalable Infrastructure
Scalable Software
Compliance
Intro to BioTeam
Who, What, Why
Q&A
1
2
3
4
5
6
3. Who, What, Why ...
!3
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to
learn IT, SW & HPC to get our
own research done
‣ 10+ years bridging the “gap”
between science, IT & high
performance computing
‣ Our wide-ranging work is what
gets us invited to speak at
events like this ...
5. Culture
BioTeam
‣ We are a distributed company
• BioTeam is 100% REMOTE
• All employees are MANAGERS
• Workflow is mostly ASYNCHRONOUS
‣ Prefer small interdisciplinary TEAMS
• Value placed on TRUST and PERFORMANCE
!5
6. Today
BioTeam
‣ 10 full-time employees in 2014
• 2 dedicated to HPC Infrastructure
• 2 dedicated to Software Development
• 1 dedicated to Products
• 1 dedicated to Government Services
• 1 dedicated to Cloud Computing
‣ 10+ years supporting Life Sciences Research
!6
10. Amazon vs. Other Clouds
‣ AWS has by far the most useful IaaS building
blocks today
• First choice for most Bio-IT use cases
‣ AWS quietly rolls out killer features
• Spot Market
• Virtual Private Cloud
‣ Provider decision may be based on where your
data actually resides
!10
12. Massive resources and API’s galore
Google
‣ Google started with PaaS and worked down
‣ Google Exacycle for Visiting Faculty (closed)
• 1 billion core hours on demand; what’s next?
‣ Google is DEVELOPER centric; everything has
an API
‣ Culture is based on Science and Engineering
!12
14. Devops
Configuration Management
‣ Required in almost every cloud project
‣ Chef/Puppet/Ansible/Fabric
• Domain specific languages; Agent-based versus SSH; Abstraction
‣ Key is reducing institutionalized knowledge and sharing
recipes
‣ Docker/lxc could be disrupting
• Lightweight differential images; not very HPC friendly at this point
‣ Orchestration tools lagging behind provisioning and
configuration
‣ Best techniques are making their way back into HPC
!14
16. open-source cluster computing toolkit
MIT StarCluster
‣ Ideal for most HPC use cases
• Includes Grid Engine, NFS, and MPI
• NEW Support for Virtual Private Cloud!
‣ Works with Spot Instances
‣ Extensible via plugins
• Hadoop
• HTCondor
• GlusterFS
• IPython Notebook
!16
22. In modern processors and coprocessors
Types of Parallelism
!22
Instruction
Level
Vector
Level
Thread
Level
Node
Level
Micro-architectural techniques such as pipelined execution,
out-of/in-order execution, super-scalar execution, branch
prediction…
Using SIMD vector processing instructions for SSE, AVX,
Phi
Multi-core architectures with or without Hyper-Threading
Many-core architecture with smart round robin hardware
multithreading
Distributed Computing
Cluster Computing
23. Fully functional multi-thread execution unit
Intel Xeon Phi Coprocessor
‣ 50+ cores with a ring interconnect
‣ 64-bit addressing
‣ Scalar unit based on Intel Pentium family
‣ Vector unit 512-bit SIMD Instructions
‣ 4 hardware threads per core
‣ Highly Parallel device
‣ SMP on-a-chip
!23
24. Choices
Programming Xeon Phi
!24
Offloaded Native
‣ Pragma/directives based
‣ Better serial processing
‣ More memory
‣ Better file access
‣ Makes full use of available resources
‣ Simpler programming model
‣ Quicker to test key kernels
‣ Some constraints
‣ Memory availability
‣ File I/O access
25. Mapping with Burrows-Wheeler Aligner (BWA)
Intel Optimization Example
!25
0
0
1
1
2
1.86
1.24
1
Xeon (baseline)
Xeon (optimized)
Xeon + Phi‣ Replace pthreads with OpenMP
‣ Better load balancing
‣ Overlap I/O and Compute
‣ Better thread usage
‣ Efficient memory allocation
‣ Vectorized performance critical
loops
‣ Data prefetch to reduce memory
latency
Source: Life Sciences Optimization - Intel - SC13
26. Protein sequence analysis with MPI-HMMER
Intel Optimization Example
!26
0
0
1
1
2
1.56
1
Xeon
Xeon + Phi
‣ No source code changes required
‣ Use #pragma unroll to improve
loop performance
‣ Double nested loop in Viterbi
algorithm is auto-vectorized for
Xeon and Phi by Intel compilers
Source: Life Sciences Optimization - Intel - SC13
27. Assembly with Velour
Intel Optimization Example
‣ Intel and UIUC released open-source
alternative to velveth
‣ > 10x reduction in memory usage
• Intelligently caching portions of assembly to disk
• 700GB to 60GB
‣ https://github.com/jjcook/velour
‣ Cook, Jeffrey J. 2011. Scaling short read de novo
DNA sequence assembly to gigabase genomes.
!27
28. Recommendations
Programming Xeon Phi
‣ Host can have multiple Phi cards
‣ MLK libraries are pre-optimized
‣ OpenMP is applicable to multi-core and many-
core programming
• omp offload target(mic)
‣ MPI supports distributed computation and
combines with other models
• OpenMP within nodes and MPI between nodes
‣ Xeon optimizations translate well to Phi
!28
29. In the Life Sciences
Parallel Programming
‣ Targets: CPU, Coprocessors, GPU, FGPA, ASIC
‣ There is no silver bullet
‣ Problem decomposition is the most critical step
‣ Think in parallel
‣ Using Intel compilers can yield ~30% speedup
in many cases
• vtune and other analysis tools are available
‣ Must optimize at one or more levels
!29
31. Recommendations
Parallel Programming
‣ Leaving performance on the table
• Low hanging fruit; splitting input files into parts
• Avoid using languages with poor concurrency model and GIL
‣ Exploit thread-level parallelism
• Use multi-threading and multi-processing to fully utilize multicore
processors
‣ Use Intel’s Auto-Vectorizing compiler
• Take advantage of SIMD parallelism and wider vectors on Phi
‣ Prepare for a heterogenous many-core future
• Hybrid Programming (OpenMP + MPI)
!31
32. Platforms
Parallel Programming
‣ Intel Distribution for Apache Hadoop
• Enhances open-source Hadoop on Xeon processors
• More efficient; faster startup times
• Management tools
‣ Intel Enterprise Edition Lustre
• Enhances open-source Lustre
• REST API
• Hadoop Adapter
!32
33. A fresh approach to technical computing
I <3 Julia
‣ Homoiconic; Dynamic type
system
‣ Designed for parallelism and
distributed computation
‣ MATLAB-like syntax and
extensive math library
‣ Call C functions directly
‣ Call Python functions
‣ IJulia Notebook
‣ Open Source
!33
35. Overview
Compliance
‣ Need a compliance apparatus
‣ Often a barrier to competition
‣ Compute and Storage are easy
• Policy and procedures are harder
‣ AWS and Google will now sign BAA
!35
36. Strategy
Compliance
‣ Keys are protecting data and preventing access
‣ Data management - points of control
‣ Encrypt data in flight and at rest
• Use S3 server-side encryption
• Google Persistent Disks are automatically encrypted
‣ Use credential rotation policies
‣ Lock down security groups and firewalls
‣ Use VPN for all public connections
‣ Log everything and audit often
!36