A session in the DevNet Zone at Cisco Live, Berlin. There are several new OpenStack projects/services that are built on core OpenStack infrastructure services. This session will first briefly discuss the changes introduced for the project governance structure in OpenStack. Subsequently, the focus of the presentation will be to provide feature and architecture details on few of the new projects and services in OpenStack. These will include Trove-Database Service, Sahara-Dataprocessing Service, Congress - Policy Service and Magnum -- Container Service. A summary of other OpenStack related services will also be provided.
Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraSpark Summit
Toon is a leading brand in the European smart energy market, currently expanding internationally, providing energy usage insights, eco-friendly energy management and smart thermostat use for the connected home. As value added services become ever more relevant in this market, we have the need to ensure that we can easily and safely on-board new tenants into our data platform. In this talk we’re going to guide you across a less discussed side of using Spark in production – devops. We will speak about our journey from an on-premise cluster to a managed solution in the cloud. A lot of moving parts were involved: ETL flows, data sharing with 3rd parties and data migration to the new environment. Add to this the need to have a multi-tenant environment, revamp our toolset and deploy a live public facing service. It’s possible to find a lot of great examples of how Spark is used for data-science purposes. On the data engineering side, we need to deploy production services, ensure data is cleaned, secured and available, and keep the data-science teams happy. We’d like to share some of the options we took and some of the lessons learned from this (ongoing) transition.
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
Since April 2016, Spark-as-a-service has been available to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site. Researchers work in an entirely UI-driven environment on a platform built with only open-source software.
Spark applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin. Spark applications are run within a project on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant Spark streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Graphana and Graphite for monitoring Spark streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Sept 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
To conclude, we will also give an overview on our course ID2223 on Large Scale Learning and Deep Learning, in which 60 students designed and ran SparkML applications on the platform.
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
Today, there are several compliance use cases ‒ archiving, e-discovery, supervision and surveillance, to name a few
‒ that appear naturally suited as Hadoop workloads, but haven’t seen wide adoption. In this session, you’ll learn about common limitations, how Apache Spark helps and some new blueprints for modernizing this architecture and disrupt existing solutions. Additionally, we’ll review the rising role of Apache Spark in this ecosystem, leveraging machine learning and advanced analytics in a space that has traditionally been restricted to fairly rote reporting.
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks
Data integration is a really difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. All we want is a service that will be reliable, handle all kinds of data and integrate with all kinds of systems, be easy to manage and scale as our systems grow. Oh, and it should be super low latency too. Is it too much to ask?
In this presentation, we’ll discuss the basic challenges of data integration and introduce few design and architecture patterns that are used to tackle these challenges. We will then explore how these patterns can be implemented using Apache Kafka. Difficult problems are difficult and we offer no silver bullets, but we will share pragmatic solutions that helped many organizations build fast, scalable and manageable data pipelines.
Eric Loyd - Fractal Nagios - Learn how Nagios XI can be used to monitor Nagios Log Server (NLS) and Nagios Network Analyzer (NNA), how Nagios Log Server and Nagios Network Analyzer can leverage Nagios XI for alerting, and how to use Nagios Log Server and Nagios Network Analyzer to monitor each other and Nagios XI and Nagios Core, including remote execution environments.
Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraSpark Summit
Toon is a leading brand in the European smart energy market, currently expanding internationally, providing energy usage insights, eco-friendly energy management and smart thermostat use for the connected home. As value added services become ever more relevant in this market, we have the need to ensure that we can easily and safely on-board new tenants into our data platform. In this talk we’re going to guide you across a less discussed side of using Spark in production – devops. We will speak about our journey from an on-premise cluster to a managed solution in the cloud. A lot of moving parts were involved: ETL flows, data sharing with 3rd parties and data migration to the new environment. Add to this the need to have a multi-tenant environment, revamp our toolset and deploy a live public facing service. It’s possible to find a lot of great examples of how Spark is used for data-science purposes. On the data engineering side, we need to deploy production services, ensure data is cleaned, secured and available, and keep the data-science teams happy. We’d like to share some of the options we took and some of the lessons learned from this (ongoing) transition.
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
Since April 2016, Spark-as-a-service has been available to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site. Researchers work in an entirely UI-driven environment on a platform built with only open-source software.
Spark applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin. Spark applications are run within a project on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant Spark streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Graphana and Graphite for monitoring Spark streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Sept 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
To conclude, we will also give an overview on our course ID2223 on Large Scale Learning and Deep Learning, in which 60 students designed and ran SparkML applications on the platform.
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
Today, there are several compliance use cases ‒ archiving, e-discovery, supervision and surveillance, to name a few
‒ that appear naturally suited as Hadoop workloads, but haven’t seen wide adoption. In this session, you’ll learn about common limitations, how Apache Spark helps and some new blueprints for modernizing this architecture and disrupt existing solutions. Additionally, we’ll review the rising role of Apache Spark in this ecosystem, leveraging machine learning and advanced analytics in a space that has traditionally been restricted to fairly rote reporting.
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks
Data integration is a really difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. All we want is a service that will be reliable, handle all kinds of data and integrate with all kinds of systems, be easy to manage and scale as our systems grow. Oh, and it should be super low latency too. Is it too much to ask?
In this presentation, we’ll discuss the basic challenges of data integration and introduce few design and architecture patterns that are used to tackle these challenges. We will then explore how these patterns can be implemented using Apache Kafka. Difficult problems are difficult and we offer no silver bullets, but we will share pragmatic solutions that helped many organizations build fast, scalable and manageable data pipelines.
Eric Loyd - Fractal Nagios - Learn how Nagios XI can be used to monitor Nagios Log Server (NLS) and Nagios Network Analyzer (NNA), how Nagios Log Server and Nagios Network Analyzer can leverage Nagios XI for alerting, and how to use Nagios Log Server and Nagios Network Analyzer to monitor each other and Nagios XI and Nagios Core, including remote execution environments.
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...Lightbend
In this webinar, Engineering Manager at Credit Karma, Dustin Lyons, discusses how not long ago his team was facing a common challenge shared by many financial services architects and engineering leaders: not only how to move from the offline, batch-mode processing of Big Data to streaming, Fast Data, and how to enable real-time decision making based on the information flowing in from over 60 million members.
Dustin reviews how his team migrated away from PHP and successfully implemented Akka Streams with Apache Kafka to ingest, process and route real-time events throughout their data ecosystem. At the end of this presentation, you’ll better understand:
* The design considerations for new Fast Data architectures, from streaming to microservices to real-time analysis.
* Some lessons learned when it comes to progressing from batch to streaming using Akka, Spark and Kafka
* Why Akka’s self-healing actor model and the resilience that it provides is actually what matters most when delivering real-time customer experiences
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...Spark Summit
Is your job running slower than usual? Do you want to make sense from the thousands of Hadoop & Spark metrics? Do you want to monitor the performance of your flow, get alerts and auto tune them? These are the common questions every Hadoop user asks but there is not a single solution that addresses it. We at Linkedin faced lots of such issues and have built a simple self serve tool for the hadoop users called Dr. Elephant. Dr. Elephant, which is already open sourced, is a performance monitoring and tuning tool for Hadoop and Spark. It tries to improve the developer productivity and cluster efficiency by making it easier to tune jobs. Since its open source, it has been adopted by multiple organizations and followed with a lot of interest in the Hadoop and Spark community. In this talk, we will discuss about Dr. Elephant and outline our efforts to expand the scope of Dr. Elephant to be a comprehensive monitoring, debugging and tuning tool for Hadoop and Spark applications. We will talk about how Dr. Elephant performs exception analysis, give clear and specific suggestions on tuning, tracking metrics and monitoring their historical trends. Open Source: https://github.com/linkedin/dr-elephant
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Spark Summit
As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to an attacker who has compromised the operating system or hypervisor. Trusted hardware such as Intel SGX has recently become available in latest-generation processors. Such hardware enables arbitrary computation on encrypted data while shielding it from a malicious OS or hypervisor. However, it still suffers from a significant side channel: access pattern leakage.
We present Opaque, a package for Apache Spark SQL that enables very strong security for SQL queries: data encryption, computation verification, and access pattern leakage protection (a.k.a. obliviousness). Opaque achieves these guarantees by introducing new oblivious distributed relational operators that provide 2000x performance gain over state of the art oblivious systems, as well as novel query planning techniques for these operators implemented using Catalyst.
Making team of data engineers and data scientists productive is a challenging task. The size of the data in "big data" problems is the first great hindrance to productivity. Apache Spark provides great foundation for the solution to this problem by offering interactive compute engine but it is not sufficient in itself. In this session we’ll review how set of open source tools including Jupyter and Livy can be combined with advanced resource management and cloud elasticity to provide comprehensive interactive platform for big data.
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...Databricks
During the last year, the team at IBM Research at Ireland has been using Apache Spark to perform analytics on large volumes of sensor data. These applications need to be executed on a daily basis, therefore, it was essential for them to understand Spark resource utilization. They found it cumbersome to manually consume and efficiently inspect the CSV files for the metrics generated at the Spark worker nodes.
Although using an external monitoring system like Ganglia would automate this process, they were still plagued with the inability to derive temporal associations between system-level metrics (e.g. CPU utilization) and job-level metrics (e.g. job or stage ID) as reported by Spark. For instance, they were not able to trace back the root cause of a peak in HDFS Reads or CPU usage to the code in their Spark application causing the bottleneck.
To overcome these limitations, they developed SparkOScope. Taking advantage of the job-level information available through the existing Spark Web UI and to minimize source-code pollution, they use the existing Spark Web UI to monitor and visualize job-level metrics of a Spark application (e.g. completion time). More importantly, they extend the Web UI with a palette of system-level metrics of the server/VM/container that each of the Spark job’s executor ran on. Using SparkOScope, you can navigate to any completed application and identify application-logic bottlenecks by inspecting the various plots providing in-depth timeseries for all relevant system-level metrics related to the Spark executors, while also easily associating them with stages, jobs and even source code lines incurring the bottleneck.
They have made Sparkoscope available as a standalone module, and also extended the available Sinks (mongodb, mysql).
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
IBM has built a “Data Science Experience” cloud service that exposes Notebook services at web scale. Behind this service, there are various components that power this platform, including Jupyter Notebooks, an enterprise gateway that manages the execution of the Jupyter Kernels and an Apache Spark cluster that power the computation. In this session we will describe our experience and best practices putting together this analytical platform as a service based on Jupyter Notebooks and Apache Spark, in particular how we built the Enterprise Gateway that enables all the Notebooks to share the Spark cluster computational resources.
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
Dr. Elephant helps improve Spark and Hadoop developer productivity and increase cluster efficiency by making clear recommendations on how to tune workloads and configurations. Originally developed by LinkedIn, Dr. Elephant is now in use at multiple sites.
This session will explore how Dr. Elephant works, the data it collects from Spark environments and the customizable heuristics that generate tuning recommendations. Learn how Dr. Elephant can be used to improve production cluster operations, help developers avoid common issues, and green light applications for use on production clusters.
Agenda:
• Brief overview of Spark provided spark-shell, spark-submit
• Overview of Spark ContextOverview of Zeppelin and Jupyter notebooks for Spark
• Introduction to IBM Spark Kernel
• Introduction to Cloudera Livy and Spark JobServer
Github Link:
Previous meetups:-
1) Introduction to Resilient Distributed Dataset and deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-and-resilient-distributed-dataset-basics-and-deep-dive
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/225159947/
Video: https://www.youtube.com/watch?v=MkeRWyF1y_0
Github: https://github.com/SatyaNarayan1/spark_meetup
2) Introduction to Spark DataFrames/SQL and Deep dive
Slides: http://www.slideshare.net/sachinparmarss/deep-dive-spark-data-frames-sql-and-catalyst-optimizer
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/226419828/
Video: https://www.youtube.com/watch?v=h71MNWRv99M
Github: https://github.com/parmarsachin/spark-dataframe-demo
3) Apache Spark - Introduction to Spark Streaming and Deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-to-spark-streaming-and-deep-dive-57671774
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/227008581/
Video:
Github: https://github.com/agsachin/spark-meetup
Looking forward to have a great interactive session. Do provide feedback.
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...Spark Summit
Data engineering to support reporting and analytics for commercial Lifesciences groups consists of very complex interdependent processing with highly complex business rules (thousands of transformations on hundreds of data sources). We will talk about our experiences in building a very high performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. We will touch upon optimizing enterprise grade Spark architecture for data warehousing and data mart type applications, optimizing end to end pipelines for extreme performance, running hundreds of jobs in parallel in Spark, orchestrating across multiple Spark clusters, and some guidelines for high speed platform and application development within enterprises. Key takeaways: – example architecture for complex data warehousing and data mart applications on Spark – architecture to build high performance Spark platforms for enterprises that balance functionality with total cost of ownership – orchestrating multiple elastic Spark clusters while running hundreds of jobs in parallel – business benefits of high performance data engineering, especially for Lifesciences.
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
This talk will address how a new architecture is emerging for analytics, based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK). Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (i.e. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...Lightbend
In this webinar, Engineering Manager at Credit Karma, Dustin Lyons, discusses how not long ago his team was facing a common challenge shared by many financial services architects and engineering leaders: not only how to move from the offline, batch-mode processing of Big Data to streaming, Fast Data, and how to enable real-time decision making based on the information flowing in from over 60 million members.
Dustin reviews how his team migrated away from PHP and successfully implemented Akka Streams with Apache Kafka to ingest, process and route real-time events throughout their data ecosystem. At the end of this presentation, you’ll better understand:
* The design considerations for new Fast Data architectures, from streaming to microservices to real-time analysis.
* Some lessons learned when it comes to progressing from batch to streaming using Akka, Spark and Kafka
* Why Akka’s self-healing actor model and the resilience that it provides is actually what matters most when delivering real-time customer experiences
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...Spark Summit
Is your job running slower than usual? Do you want to make sense from the thousands of Hadoop & Spark metrics? Do you want to monitor the performance of your flow, get alerts and auto tune them? These are the common questions every Hadoop user asks but there is not a single solution that addresses it. We at Linkedin faced lots of such issues and have built a simple self serve tool for the hadoop users called Dr. Elephant. Dr. Elephant, which is already open sourced, is a performance monitoring and tuning tool for Hadoop and Spark. It tries to improve the developer productivity and cluster efficiency by making it easier to tune jobs. Since its open source, it has been adopted by multiple organizations and followed with a lot of interest in the Hadoop and Spark community. In this talk, we will discuss about Dr. Elephant and outline our efforts to expand the scope of Dr. Elephant to be a comprehensive monitoring, debugging and tuning tool for Hadoop and Spark applications. We will talk about how Dr. Elephant performs exception analysis, give clear and specific suggestions on tuning, tracking metrics and monitoring their historical trends. Open Source: https://github.com/linkedin/dr-elephant
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Spark Summit
As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to an attacker who has compromised the operating system or hypervisor. Trusted hardware such as Intel SGX has recently become available in latest-generation processors. Such hardware enables arbitrary computation on encrypted data while shielding it from a malicious OS or hypervisor. However, it still suffers from a significant side channel: access pattern leakage.
We present Opaque, a package for Apache Spark SQL that enables very strong security for SQL queries: data encryption, computation verification, and access pattern leakage protection (a.k.a. obliviousness). Opaque achieves these guarantees by introducing new oblivious distributed relational operators that provide 2000x performance gain over state of the art oblivious systems, as well as novel query planning techniques for these operators implemented using Catalyst.
Making team of data engineers and data scientists productive is a challenging task. The size of the data in "big data" problems is the first great hindrance to productivity. Apache Spark provides great foundation for the solution to this problem by offering interactive compute engine but it is not sufficient in itself. In this session we’ll review how set of open source tools including Jupyter and Livy can be combined with advanced resource management and cloud elasticity to provide comprehensive interactive platform for big data.
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...Databricks
During the last year, the team at IBM Research at Ireland has been using Apache Spark to perform analytics on large volumes of sensor data. These applications need to be executed on a daily basis, therefore, it was essential for them to understand Spark resource utilization. They found it cumbersome to manually consume and efficiently inspect the CSV files for the metrics generated at the Spark worker nodes.
Although using an external monitoring system like Ganglia would automate this process, they were still plagued with the inability to derive temporal associations between system-level metrics (e.g. CPU utilization) and job-level metrics (e.g. job or stage ID) as reported by Spark. For instance, they were not able to trace back the root cause of a peak in HDFS Reads or CPU usage to the code in their Spark application causing the bottleneck.
To overcome these limitations, they developed SparkOScope. Taking advantage of the job-level information available through the existing Spark Web UI and to minimize source-code pollution, they use the existing Spark Web UI to monitor and visualize job-level metrics of a Spark application (e.g. completion time). More importantly, they extend the Web UI with a palette of system-level metrics of the server/VM/container that each of the Spark job’s executor ran on. Using SparkOScope, you can navigate to any completed application and identify application-logic bottlenecks by inspecting the various plots providing in-depth timeseries for all relevant system-level metrics related to the Spark executors, while also easily associating them with stages, jobs and even source code lines incurring the bottleneck.
They have made Sparkoscope available as a standalone module, and also extended the available Sinks (mongodb, mysql).
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
IBM has built a “Data Science Experience” cloud service that exposes Notebook services at web scale. Behind this service, there are various components that power this platform, including Jupyter Notebooks, an enterprise gateway that manages the execution of the Jupyter Kernels and an Apache Spark cluster that power the computation. In this session we will describe our experience and best practices putting together this analytical platform as a service based on Jupyter Notebooks and Apache Spark, in particular how we built the Enterprise Gateway that enables all the Notebooks to share the Spark cluster computational resources.
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
Dr. Elephant helps improve Spark and Hadoop developer productivity and increase cluster efficiency by making clear recommendations on how to tune workloads and configurations. Originally developed by LinkedIn, Dr. Elephant is now in use at multiple sites.
This session will explore how Dr. Elephant works, the data it collects from Spark environments and the customizable heuristics that generate tuning recommendations. Learn how Dr. Elephant can be used to improve production cluster operations, help developers avoid common issues, and green light applications for use on production clusters.
Agenda:
• Brief overview of Spark provided spark-shell, spark-submit
• Overview of Spark ContextOverview of Zeppelin and Jupyter notebooks for Spark
• Introduction to IBM Spark Kernel
• Introduction to Cloudera Livy and Spark JobServer
Github Link:
Previous meetups:-
1) Introduction to Resilient Distributed Dataset and deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-and-resilient-distributed-dataset-basics-and-deep-dive
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/225159947/
Video: https://www.youtube.com/watch?v=MkeRWyF1y_0
Github: https://github.com/SatyaNarayan1/spark_meetup
2) Introduction to Spark DataFrames/SQL and Deep dive
Slides: http://www.slideshare.net/sachinparmarss/deep-dive-spark-data-frames-sql-and-catalyst-optimizer
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/226419828/
Video: https://www.youtube.com/watch?v=h71MNWRv99M
Github: https://github.com/parmarsachin/spark-dataframe-demo
3) Apache Spark - Introduction to Spark Streaming and Deep dive
Slides: http://www.slideshare.net/differentsachin/apache-spark-introduction-to-spark-streaming-and-deep-dive-57671774
Meetup: http://www.meetup.com/Big-Data-Developers-in-Bangalore/events/227008581/
Video:
Github: https://github.com/agsachin/spark-meetup
Looking forward to have a great interactive session. Do provide feedback.
High Performance Enterprise Data Processing with Apache Spark with Sandeep Va...Spark Summit
Data engineering to support reporting and analytics for commercial Lifesciences groups consists of very complex interdependent processing with highly complex business rules (thousands of transformations on hundreds of data sources). We will talk about our experiences in building a very high performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. We will touch upon optimizing enterprise grade Spark architecture for data warehousing and data mart type applications, optimizing end to end pipelines for extreme performance, running hundreds of jobs in parallel in Spark, orchestrating across multiple Spark clusters, and some guidelines for high speed platform and application development within enterprises. Key takeaways: – example architecture for complex data warehousing and data mart applications on Spark – architecture to build high performance Spark platforms for enterprises that balance functionality with total cost of ownership – orchestrating multiple elastic Spark clusters while running hundreds of jobs in parallel – business benefits of high performance data engineering, especially for Lifesciences.
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
This talk will address how a new architecture is emerging for analytics, based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK). Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (i.e. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.
Humic acid is made from Leonardite with both powder and granular type .It has high molecule weight soluble in alkaline solution but not soluble in water and acid .
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
There are several new upcoming OpenStack projects/services that are build upon the core OpenStack infrastructure services. This session will first briefly discuss the new changes introduced for the project governance structure in OpenStack. Subsequently, the focus of the presentation will be to provide feature and architecture details on few of the new projects and services in OpenStack. These will include Trove-Database Service, Sahara-Dataprocessing Service, Congress - Policy Service and Magnum – Container Service. A summary of other OpenStack related services will also be provided.
[Presented at All Things Open 2015 in Raleigh, NC, USA]
OpenStack is one of the fastest-growing and exciting open source projects of our time. OpenStack has drawn together technologists from all over the world to create a cloud operating system and a huge, diverse community behind it. This talk will provide an introduction to OpenStack for newcomers to the project of those who just want to know more. We’ll take a brief look at OpenStack’s history, get a technical overview of the project, learn how to contribute, and check out a few emerging trends and hot topics in the OpenStack world.
Introduction to Open stack - An Overview SpringPeople
OpenStack is a free & open-source software platform for cloud computing, mostly deployed as an IaaS. In this Slide, we will cover:
- Evolution of Openstack
- Cloud, its types and advantages
- Importance and overview of Openstack
- Openstack course syllabus
We repeat an introductory presentation on the OpenStack project, as many of our new members have asked to receive a complete overview. During this presentation we shall visit the different components and provide a high-level description on the architecture of OpenStack software. We shall also refer to the community around the project and as usual discuss any issues posed by the attendees.
This is a great chance to get to know better the internals of OpenStack, so i highly recommend to share with any interested party.
OpenStack: Toward a More Resilient CloudMark Voelker
Since it's inception over four years ago, OpenStack has become the most popular open source software for building many types of clouds in part due to the flexibility it provides. As more adoption increases, interest has increased in building OpenStack clouds on a highly available control plane infrastructure. In this talk we will provide an introduction to today's OpenStack community and software, then dive deeper into how to build more highly available, scalable OpenStack architectures. - See more at: http://www.percona.com/news-and-events/percona-university-smart-data-raleigh/openstack-toward-more-resilient-cloud#sthash.wicdUMdH.dpuf
Learn how and why John McDonough contributes to Ansible and how you can too. We’ll arm you with what you need to know, things like Python, Git, and YAML.
Rome 2017: Building advanced voice assistants and chat botsCisco DevNet
If it takes minutes to code a simple bot, building professional bots represents quite a challenge. Soon you realize you need serious programming and API architecture experience but also “Bot” specific skills. In this session, we'll first show the code of advanced Chat and Voice interactions, and then explore the challenges faced when building advanced Bots (Context storage, NLP approaches, Bot Metadata, OAuth scopes), and discuss interesting opportunities from latest industry trends (Bot platforms, Serverless, Microservices). This talk is about showing the code and sharing lessons learned.
How to Build Advanced Voice Assistants and ChatbotsCisco DevNet
Learn more about the CodeMotion Voice Machine and Cisco DevNet Chatbot. Understand what a typical bot journey is and where to go to get more information about Cisco Spark and Tropo.
Cisco Spark and Tropo and the Programmable WebCisco DevNet
Learn how Cisco Spark and Tropo collaboration features can be easily combined with hundreds of cloud APIs to build sophisticated, flexible workflows via a new breed of programmable web solutions from 'Integration Platform as a Service (iPaaS)' partners like Built.io, Zapier and IFTTT. This session covers multiple real-world Cisco+iPaaS use-cases, and includes a hands-on walk-through demonstrating how to build a Spark+Tropo sample application using Built.io.
Watch the BRK-DEV2004 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92557&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
Device Programmability with Cisco Plug-n-Play SolutionCisco DevNet
Cisco Open Plug-n-Play solution allows customers to reduce the costs associated with deployment/installation of network devices, increase the speed and reduce the complexity of deployments without compromising the security. Using Cisco Plug-n-Play solution, customers can do Zero Touch Installs of Cisco gear in various deployment scenarios and deployment locations.
Watch the DevNet 2052 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=91108&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
Building a WiFi Hotspot with NodeJS: Cisco Meraki - ExCap APICisco DevNet
Captive Portals, also known as Splash Pages, are a common requirement for guest WiFi. Captive portals typically deliver branding, a terms of service and a simple login process before authenticating the client onto the network. By leveraging the Meraki ExCap API, developers can customize this experience based on their requirements. This deep dive will walk through the various API options: Click-through vs Sign-on Splash page Programming a Click-through and Sign-on (w/ RADIUS) using NodeJS Programming a Click-through with Node-RED Leveraging OAuth for social login support.
Watch the DevNet 2049 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92727&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
Application Visibility and Experience through Flexible NetflowCisco DevNet
The world of applications is changing rapidly in the enterprise; from the way applications are increasingly hosted in the cloud, the diverse nature of apps and to the way they are consumed by many devices. The need for organizations and network administrators is to focus on "Fast IT" - "Innovation in the Enterprise" is growing, which means having to spend less time on daily operations, maintenance and troubleshooting and more time on delivering business value with newer services. Cisco AVC with its NBAR2 technology is designed to detect applications and measure application performance through measuring round trip time, retransmission rates, jitter, delay, packet loss, MoS, URL statistics etc. Those details are transmitted using Flexible Netflow/IPFIX, so partners could leverage the data for application usage reporting, performance reporting and troubleshooting application issues to deliver best possible application experience.
Watch the DevNet 2047 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92664&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
The WAN Automation Engine (WAE) is a software platform that provides multivendor and multilayer visibility and analysis for service provider and large enterprise networks. It plays a critical role in answering key questions of network resource availability, and when appropriate can automate and simplify Traffic Engineering mechanisms such as RSVP-TE and Segment Routing. This session will focus on use-cases and APIs for developers.
Watch the DevNet 2035 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92720&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
Cisco's Open Device Programmability Strategy: Open DiscussionCisco DevNet
Cisco DNA is an open and extensible, software-driven architecture built on a set of design principles with the objective of providing:
- Insights & Actions to drive faster business innovation
- Automaton & Assurance to lower IT costs and complexity while meeting business and user expectations
- Security & Compliance to reduce risk as the organization continues to expand and grow. The architecture extends to Cisco network elements.
This session will focus on the open, model-driven, programmable interfaces available across Cisco's network elements which enable you to leverage and extend your network through applications that directly access the routers and switches in your network.
Watch the DevNet 1028 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=91041&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)Cisco DevNet
In this small group, hands-on workshop session you'll learn how to write your first Python application that uses YANG, NETCONF and , RESTCONF to access operational and configuration data on a device.
Watch the DevNet 2044 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92725&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
NETCONF & YANG Enablement of Network DevicesCisco DevNet
A technical discussion and a demo showing how Tail-f's ConfD management agent can be used to implement NETCONF and YANG, the industry-leading solution for providing a programmable management interface in a network element. ConfD is recognized as the best-in-breed embedded software for implementing management functions in network elements, including physical devices and virtualized network functions (VNF) for NFV.
This Workshop is a best fit for engineers who are involved in the design and development of embedded software for network devices. Attendees will gain a basic understanding of what NETCONF and YANG are and how ConfD provides a solution for embedding this technology in the network devices. More information about ConfD can be found at: https://developer.cisco.com/site/confD/
Watch the DevNet 1216 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92703&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
UCS Management APIs A Technical Deep DiveCisco DevNet
Underneath the UCS API Python SDK, Powershell Libraries and VMware and OpenStack plugins there is the UCS XML API itself. This session will go deep into the API and explain how the SDK, Libraries and plugins actually communicate with UCS components. We will cover API session management, queries, query filters, configuration methods, functions and event subscription. Understanding the low-level UCS APIs and Object Model will enable you to build your own programmatic interface into your UCS environments in the language you like on the platform of your choosing.
Watch the DevNet 3003 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=91099&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
The DevOps model is rapidly transforming IT operations and development practices. But what are the precursors necessary to implement DevOps? To achieve an agile, virtualized, and highly automated IT environment, what technological requirements need to be in place? OpenStack has the potential to facilitate DevOps implementation and practices at several different layers in the data center. In this session we'll quickly discuss what DevOps is, then discuss many components that are logically required to move towards DevOps in your environment. Finally we'll explore in depth several ways OpenStack can provide these baseline components.
Watch the DevNet 1104 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92695&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
What is Tropo, how do you use it, and what can you use it for? In this session, you'll learn how Tropo works, see some real-life examples, and learn how to create your own voice and SMS applications in minutes.
Watch the DevNet 1023 replay from the Cisco Live On-Demand Library at:https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=91050&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
DevNet Express - Spark & Tropo API - Lisbon May 2016Cisco DevNet
Direct from the Cisco DevNet Lisbon Portugal Express event in May 2016. Learn about Cisco DevNet, Spark and Tropo APIs any why there's never been a better time to innovate with Cisco.
Direct from DevNet@TAG in Milan and Rome in May 2016! Learn about Cisco DevNet, Spark and Tropo APIs any why there's never been a better time to innovate with Cisco.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
6. OpenStack Progress
Austin – Oct 2010
Bexar – Feb 2011
Cactus– April 2011
Diablo – Sept 2011
Essex – April 2012
Folsom– Sept 2012
Grizzly– April 2013
Havana – Oct 2013
IceHouse– April 2014
Juno – Oct 2014
Kilo – May 2015
130 contributors
30 new features
2010
2011
2012
2013
2014
Started with Compute
and Storage service
12th OpenStack release
1933 contributors
760 new features
8300 bugs fixed
164 companies
Liberty – Oct 2015
24,000 people
495 companies
Mitaka – April 2016
2015
Newton – Oct 2016
8. • Database as a Service - Automating
complex database administrative
tasks – Deployment, Configuration,
Scaling, HA
• Single Admin Tenant Database per
Nova instance
• Datastore type - Relational and Non-
Relational
• Pluggable – Support for MySQL,
PostgreSQL, NoSQL – Cassandra,
MongoDB, Couchbase, Redis
• Integration with other projects –
Designate, Heat, Neutron
Trove
API Functionality
Management Create/Delete/Show/List Database
instance, Database, Users, Flavors
Security Support for Security Groups, No SSH by
default
Configuration
groups
Support for user defined configuration
settings (MySQL, MongoDB) – Eg:
max_connections, Buffer pool size etc
Backups Support for full backups and incremental
backups using Swift (MySQL,
Cassandara, Couchbase)
Replication Async mysql master-slave replication from
snapshot of master
Clustering Support for shards - three member replica-
sets (MongoDB)
10. • Cluster provisioning - Create and Manage Hadoop clusters
• Node Group Templates – Defines instances/nodes within a
cluster that will each run selected hadoop processes and store
data
• Plugins – Responsible for provisioning Hadoop cluster – Vanilla,
Hortonworks, MapR, Clouder, Spark
• Cluster Templates – What Node Groups to be included and
how many instances to be created in each
• Anti Affinity Groups – Processes may not be launched more than
once on a single host
• Cluster – Represents a Hadoop cluster run using Cluster
Template
• Image Registry – Used to provide additional information about
images using tags
• Cluster Scaling - Change instances in a existing Node group or add
new Node groups
Sahara
• Configure HDFS and MapReduce
parameters at Node and Cluster level
• Integration with Cinder, Swift, Neutron, Heat
API Functionality
Data
Sources
Stores URL which designates
the location of input and
output data
Job
Binaries
URL to script or Jar file stored
in internal DB or Swift
Jobs Specifies the job and lists all
individual Job Binary objects
required for execution
Job
Execution
Monitor and Manage a Job
executed/launched on a
Cluster
12. Ironic
• Service for Bare metal management
• Ironic API – RESTful API service
• Ironic Conductor – Interacts with hardware;
asynchronous handling of both requested and periodic
actions
• Ironic Python Agent – utility service temporarily booted
on machines to provide remote access to hardware for
provisioning and management
• Ironic Drivers – Communicate with hardware devices
• Nova driver
• Using Nova API can provision bare metal servers
• Based on Open technologies
• DHCP, TFTP, PXE - In a PXE environment, TFTP is
used to download NBP over the network using
information from the DHCP server.
• Ironic API – Chassis, Drivers, Links, Nodes, Ports
14. • Nova Container virtualization drivers - Docker, LXC, OpenVZ,
ZeroVM
• Heat resource for Docker
• Container as a Service (“Nova of containers”) - Provide a REST
API for Container management
• Provide app isolation, portability, manageability with containers
• Containers in VMs, Bare Metal, Containers
• Resources
• Container – a container
• Node – a bare metal or virtual machine where work executes
• Bay – Collection of nodes where work is scheduled
• Service – a port to Pod mapping
• Pod – a collection of containers running on one physical or
virtual machine
Magnum
Kubernetes
Endpoint
Docker
Endpoint
Mesos
Endpoint
Magnum
Launch instances with
Agent for hosting
Containers
Operation on Service
and Pod objects
Operation on Container
object
• Integration with Kubernetes, Docker,
Mesos
• Companies – Rackspace, RedHat,
Cisco and others
16. • Containerization of OpenStack services - All "core" OpenStack
services implemented as micro-services in Docker containers.
• Technology
• Dockers – Deploying containers and managing images
• Ansible – Orchestration tool for multi node deployment
• Jinja2 – Templating language for Python
• Developing and Deploying OpenStack service using Kolla
• Supports deployment from binary and source across multiple
distributions - CentOS, Debian, Fedora, Oracle Linux, RHEL,
and Ubuntu
• AIO and multinode deployment using Ansible with n-way active
high availability.
• Development environments using Heat, Vagrant, or bare-metal.
Kolla
Node
Keystone
Nova
Glance
Neutron
Horizon
Horizon
….
17. • OpenStack services ecosystem is
expanding
• Lots of opportunities to contribute and
influence the community
Summary