Talk Abstract
A common tradeoff made by fault-tolerant, distributed systems is the ease of user interaction with the system. Implementing correct distributed operations in the face of failures often takes priority over reducing the level of effort required to use the system. Because of this, applying a problem in a specific domain to the system can require significant planning and effort by the user. Apache Accumulo, and its sorted, Key-Value data model, is subject to this same problem: it is often difficult to use Accumulo to quickly ascertain real-life answers about some concrete problem.
This problem, not unique to Accumulo itself, has spurred the growth of numerous projects to fill these kinds of gaps in usability, in addition to multiple language bindings provided by applications. Outside of the Java API, Accumulo client support varies from programming languages, like Python or Ruby, to standalone projects that provide their own query language, such as Apache Pig and Apache Hive. This talk will cover the state of client support outside of Accumulo’s Java API with an emphasis on the pros, cons, and best practices of each alternative.
Speaker
Josh Elser
Member of Technical Staff, Hortonworks
Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software and is an Apache Accumulo committer and PMC member. He is also a committer and PMC member of Apache Slider (incubating) and regularly contributes to other Apache projects in the Apache Hadoop ecosystem. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]Accumulo Summit
Talk Abstract
The upcoming Hortonworks Data Platform (HDP) 2.3 includes significant additions to Accumulo, within the project itself and in its interactions with the larger Hadoop ecosystem.This session will cover high-level changes that improve usability, management and security of Accumulo. Administrators of Accumulo now have the ability to deploy, manage and dynamically configure Accumulo clusters using Apache Ambari.As a part of Ambari integration, the metrics system in Accumulo has been updated to use the standard “Hadoop Metrics2” metrics subsystem which provides native Ganglia and Graphite support as well as supporting the new Ambari Metrics System. On the security front,Accumulo was also improved to support client authentication via Kerberos, while earlier versions of Accumulo only supported Kerberos authentication for server processes.With these changes,Accumulo clients can authenticate solely using their Kerberos identity across the entire Hadoop cluster without the need to manage passwords.
Speakers
Billie Rinaldi Senior Member of Technical Staff, Hortonworks
Billie Rinaldi is a Senior Member of Technical Staff at Hortonworks, Inc., currently prototyping new features related to application monitoring and deployment in the Apache Hadoop ecosystem. Prior to August 2012, Billie engaged in big data science and research at the National Security Agency. Since 2008, she has been providing technical leadership regarding the software that is now Apache Accumulo. Billie is the VP of Apache Accumulo, the Accumulo Project Management Committee Chair, and a member of the Apache Software Foundation. She holds a Ph.D. in applied mathematics from Rensselaer Polytechnic Institute.
Josh Elser Member of Technical Staff, Hortonworks
Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software and is an Apache Accumulo committer and PMC member. He is also a committer and PMC member of Apache Slider (incubating) and regularly contributes to other Apache projects in the Apache Hadoop ecosystem. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
Using Ansible to deploy a 6-node Hortonworks Data Platform (hadoop) cluster on AWS with the ObjectRocket ansible-hadoop playbook.
Presented at the Ansible NOVA MeetUp on February 23, 2017: https://www.meetup.com/Ansible-NOVA/events/236853616/
Demand for cloud is through the roof. Cloud is turbo charging the Enterprise IT landscape with agility and flexibility. And now, discussions of cloud architecture dominate Enterprise IT. Cloud is enabling many ephemeral on-demand use cases which is a game changing opportunity for analytic workloads. But all of this comes with the challenges of running enterprise workloads in the cloud securely and with ease.
In this session, we will take you through Cloudbreak as a solution to simplify provisioning and managing enterprise workloads while providing an open and common experience for deploying workloads across clouds. We will discuss the challenges (and opportunities) to run enterprise workloads in the cloud and will go through a live demo of how the latest from Cloudbreak enables enterprises to easily and securely run Apache Hadoop. This includes deep-dive discussion on Ambari Blueprints, recipes, custom images, and enabling Kerberos -- which are all key capabilities for Enterprise deployments.
Speakers
Jeff Sposetti, VP Product Management, Hortonworks
Attila Kanto, Principal Engineer, Hortonworks
As Hadoop becomes the defacto big data platform, enterprises deploy HDP across wide range of physical and virtual environments spanning private and public clouds. This session will cover key considerations for cloud deployment and showcase Cloudbreak for simple and consistent deployment across cloud providers of choice.
Hortonworks technical workshop operations with ambariHortonworks
Ambari continues on its journey of provisioning, monitoring and managing enterprise Hadoop deployments. With 2.0, Apache Ambari brings a host of new capabilities including updated metric collections; Kerberos setup automation and developer views for Big Data developers. In this Hortonworks Technical Workshop session we will provide an in-depth look into Apache Ambari 2.0 and showcase security setup automation using Ambari 2.0. View the recording at https://www.brighttalk.com/webcast/9573/155575. View the github demo work at https://github.com/abajwa-hw/ambari-workshops/blob/master/blueprints-demo-security.md. Recorded May 28, 2015.
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
Speaker
Jayush Luniya, Principal Software Engineer, Hortonworks
Apache Ambari BOF Meet Up @ Hadoop Summit 2013
APIs and SPIs – How to Integrate with Ambari
http://www.meetup.com/Apache-Ambari-User-Group/events/119184782/
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]Accumulo Summit
Talk Abstract
The upcoming Hortonworks Data Platform (HDP) 2.3 includes significant additions to Accumulo, within the project itself and in its interactions with the larger Hadoop ecosystem.This session will cover high-level changes that improve usability, management and security of Accumulo. Administrators of Accumulo now have the ability to deploy, manage and dynamically configure Accumulo clusters using Apache Ambari.As a part of Ambari integration, the metrics system in Accumulo has been updated to use the standard “Hadoop Metrics2” metrics subsystem which provides native Ganglia and Graphite support as well as supporting the new Ambari Metrics System. On the security front,Accumulo was also improved to support client authentication via Kerberos, while earlier versions of Accumulo only supported Kerberos authentication for server processes.With these changes,Accumulo clients can authenticate solely using their Kerberos identity across the entire Hadoop cluster without the need to manage passwords.
Speakers
Billie Rinaldi Senior Member of Technical Staff, Hortonworks
Billie Rinaldi is a Senior Member of Technical Staff at Hortonworks, Inc., currently prototyping new features related to application monitoring and deployment in the Apache Hadoop ecosystem. Prior to August 2012, Billie engaged in big data science and research at the National Security Agency. Since 2008, she has been providing technical leadership regarding the software that is now Apache Accumulo. Billie is the VP of Apache Accumulo, the Accumulo Project Management Committee Chair, and a member of the Apache Software Foundation. She holds a Ph.D. in applied mathematics from Rensselaer Polytechnic Institute.
Josh Elser Member of Technical Staff, Hortonworks
Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software and is an Apache Accumulo committer and PMC member. He is also a committer and PMC member of Apache Slider (incubating) and regularly contributes to other Apache projects in the Apache Hadoop ecosystem. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
Using Ansible to deploy a 6-node Hortonworks Data Platform (hadoop) cluster on AWS with the ObjectRocket ansible-hadoop playbook.
Presented at the Ansible NOVA MeetUp on February 23, 2017: https://www.meetup.com/Ansible-NOVA/events/236853616/
Demand for cloud is through the roof. Cloud is turbo charging the Enterprise IT landscape with agility and flexibility. And now, discussions of cloud architecture dominate Enterprise IT. Cloud is enabling many ephemeral on-demand use cases which is a game changing opportunity for analytic workloads. But all of this comes with the challenges of running enterprise workloads in the cloud securely and with ease.
In this session, we will take you through Cloudbreak as a solution to simplify provisioning and managing enterprise workloads while providing an open and common experience for deploying workloads across clouds. We will discuss the challenges (and opportunities) to run enterprise workloads in the cloud and will go through a live demo of how the latest from Cloudbreak enables enterprises to easily and securely run Apache Hadoop. This includes deep-dive discussion on Ambari Blueprints, recipes, custom images, and enabling Kerberos -- which are all key capabilities for Enterprise deployments.
Speakers
Jeff Sposetti, VP Product Management, Hortonworks
Attila Kanto, Principal Engineer, Hortonworks
As Hadoop becomes the defacto big data platform, enterprises deploy HDP across wide range of physical and virtual environments spanning private and public clouds. This session will cover key considerations for cloud deployment and showcase Cloudbreak for simple and consistent deployment across cloud providers of choice.
Hortonworks technical workshop operations with ambariHortonworks
Ambari continues on its journey of provisioning, monitoring and managing enterprise Hadoop deployments. With 2.0, Apache Ambari brings a host of new capabilities including updated metric collections; Kerberos setup automation and developer views for Big Data developers. In this Hortonworks Technical Workshop session we will provide an in-depth look into Apache Ambari 2.0 and showcase security setup automation using Ambari 2.0. View the recording at https://www.brighttalk.com/webcast/9573/155575. View the github demo work at https://github.com/abajwa-hw/ambari-workshops/blob/master/blueprints-demo-security.md. Recorded May 28, 2015.
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
Speaker
Jayush Luniya, Principal Software Engineer, Hortonworks
Apache Ambari BOF Meet Up @ Hadoop Summit 2013
APIs and SPIs – How to Integrate with Ambari
http://www.meetup.com/Apache-Ambari-User-Group/events/119184782/
Apache Hadoop YARN is the resource and application manager for Apache Hadoop. In the past, YARN only supported launching containers as processes. However, as containerization has become extremely popular, more and more users wanted support for launching Docker containers. With recent changes to YARN, it now supports running Docker containers alongside process containers. Couple this with the newly added support for running services on YARN and it allows a host of new possibilities. In this talk, we'll present how to run a potential container cloud on YARN. Leveraging the support in YARN for Docker and services, we can allow users to spin up a bunch of Docker containers for their applications. These containers can be self contained or wired up to form more complex applications(using the Assemblies support in YARN). We will go over some of the lessons we learned as part of our experiences handling issues such as resource management, debugging application failures, running Docker, etc.
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevWerner Keil
At Maersk Line, not only the world's biggest ships 'Triple-E' class vessels are currently being built. Continuous Integration and Delivery on a similar scale using Hudson, Maven and tools like Kokki similar to Puppet or Chef are also practiced there.
This session is going to give a brief overview of Multi-Configuration (Matrix) job types used in most of these projects. Things are being built and deployed in a heterogenous environment, otherwise probably found only at large vendors of Public Cloud services.
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains as well as integration with other big data technologies such as Apache Spark, Druid, and Kafka. The talk will also provide a glimpse of what is expected to come in the near future.
Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop and Hortonworks Data Platform (HDP). Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers views to the end user. In this session Hortonworks will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
Imagine if you could build and deploy an end to end complex streaming analytics app on a streaming engine like Storm or Flink that did the following:
1. Joining Streams
2. Aggregations over Windows (Time or Count based)
3. Complex Event Processing
4. Pattern Matching
5. Model scoring.
Now imagine implementing and deploying this without writing a single line of code in under 10 mins.
Imagine no more; it is indeed here. In this talk, we will discuss an exciting open source project led by Hortonworks on building and deploying streaming applications using a drag and drop paradigm.
Enabling a hardware accelerated deep learning data science experience for Apa...DataWorks Summit
Deep learning techniques are finding significant commercial success in a wide variety of industries. Large unstructured data sets such as images, videos, speech and text are great for deep learning, but impose a lot of demands on computing resources. New types of hardware architectures such as GPUs and faster interconnects (e.g. NVLink), RDMA capable networking interface from Mellanox available on OpenPOWER and IBM POWER systems are enabling practical speedups for deep learning. Data Scientists can intuitively incorporate deep learning capabilities on accelerated hardware using open source components such as Jupyter and Zeppelin notebooks, RStudio, Spark, Python, Docker, and Kubernetes with IBM PowerAI. Jupyter and Apache Zeppelin integrate well with Apache Spark and Hadoop using the Apache Livy project. This session will show some deep learning build and deploy steps using Tensorflow and Caffe in Docker containers running in a hardware accelerated private cloud container service. This session will also show system architectures and best practices for deployments on accelerated hardware. INDRAJIT PODDAR, Senior Technical Staff Member, IBM
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Speaker
Alan Gates, Co-founder, Hortonworks
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
Kerberos is the ubiquitous authentication mechanism when it comes to secure any Hadoop Services. With recent updates in Hadoop core and various Apache Hadoop components, inherent Kerberos support has matured and has come a long way.
Understanding & configuring Kerberos is still a challenge but even more painful & frustrating is troubleshooting a Kerberos issue. There are lot of things (small & big) that can go wrong (and will go wrong!). This talk covers the Kerberos debugging part in detail and discusses the tools & tricks that can be used to narrow down any Kerberos issue.
Rather than discussing the issues and their resolution, we will focus on how to approach a Kerberos problem and do's / dont's in Kerberos scene. This talk will provide a step by step guide that will equip the audience for troubleshooting future Kerberos problems.
Agenda is to discuss:
- Systematic approach to Kerberos troubleshooting
- Kerberos Tools available in Hadoop arsenal
- Tips & Tricks to narrow down Kerberos issues quickly
- Some nasty Kerberos issues from Support trenches
Some prior knowledge on Kerberos basics will be appreciated but is not a prerequisite.
Speaker:
Vipin Rathor, Sr. Product Specialist (HDP Security), Hortonworks
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
Apache Hive has been continuously evolving to support a broad range of use cases, bringing it beyond its batch processing roots to its current support for interactive queries with sub-second response times using LLAP. However, the development of its execution internals is not sufficient to guarantee efficient performance, since poorly optimized queries can create a bottleneck in the system. Hence, each release of Hive has included new features for its optimizer aimed to generate better plans and deliver improvements to query execution. In this talk, we present the development of the optimizer since its initial release. We describe its current state and how Hive leverages the latest Apache Calcite features to generate the most efficient execution plans. We show numbers demonstrating the improvements brought to Hive performance, and we discuss future directions for the next-generation Hive optimizer, which include an enhanced cost model, materialized views support, and complex query decorrelation.
Apache Hadoop YARN is the resource and application manager for Apache Hadoop. In the past, YARN only supported launching containers as processes. However, as containerization has become extremely popular, more and more users wanted support for launching Docker containers. With recent changes to YARN, it now supports running Docker containers alongside process containers. Couple this with the newly added support for running services on YARN and it allows a host of new possibilities. In this talk, we'll present how to run a potential container cloud on YARN. Leveraging the support in YARN for Docker and services, we can allow users to spin up a bunch of Docker containers for their applications. These containers can be self contained or wired up to form more complex applications(using the Assemblies support in YARN). We will go over some of the lessons we learned as part of our experiences handling issues such as resource management, debugging application failures, running Docker, etc.
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevWerner Keil
At Maersk Line, not only the world's biggest ships 'Triple-E' class vessels are currently being built. Continuous Integration and Delivery on a similar scale using Hudson, Maven and tools like Kokki similar to Puppet or Chef are also practiced there.
This session is going to give a brief overview of Multi-Configuration (Matrix) job types used in most of these projects. Things are being built and deployed in a heterogenous environment, otherwise probably found only at large vendors of Public Cloud services.
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains as well as integration with other big data technologies such as Apache Spark, Druid, and Kafka. The talk will also provide a glimpse of what is expected to come in the near future.
Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop and Hortonworks Data Platform (HDP). Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers views to the end user. In this session Hortonworks will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
Imagine if you could build and deploy an end to end complex streaming analytics app on a streaming engine like Storm or Flink that did the following:
1. Joining Streams
2. Aggregations over Windows (Time or Count based)
3. Complex Event Processing
4. Pattern Matching
5. Model scoring.
Now imagine implementing and deploying this without writing a single line of code in under 10 mins.
Imagine no more; it is indeed here. In this talk, we will discuss an exciting open source project led by Hortonworks on building and deploying streaming applications using a drag and drop paradigm.
Enabling a hardware accelerated deep learning data science experience for Apa...DataWorks Summit
Deep learning techniques are finding significant commercial success in a wide variety of industries. Large unstructured data sets such as images, videos, speech and text are great for deep learning, but impose a lot of demands on computing resources. New types of hardware architectures such as GPUs and faster interconnects (e.g. NVLink), RDMA capable networking interface from Mellanox available on OpenPOWER and IBM POWER systems are enabling practical speedups for deep learning. Data Scientists can intuitively incorporate deep learning capabilities on accelerated hardware using open source components such as Jupyter and Zeppelin notebooks, RStudio, Spark, Python, Docker, and Kubernetes with IBM PowerAI. Jupyter and Apache Zeppelin integrate well with Apache Spark and Hadoop using the Apache Livy project. This session will show some deep learning build and deploy steps using Tensorflow and Caffe in Docker containers running in a hardware accelerated private cloud container service. This session will also show system architectures and best practices for deployments on accelerated hardware. INDRAJIT PODDAR, Senior Technical Staff Member, IBM
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Speaker
Alan Gates, Co-founder, Hortonworks
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
Kerberos is the ubiquitous authentication mechanism when it comes to secure any Hadoop Services. With recent updates in Hadoop core and various Apache Hadoop components, inherent Kerberos support has matured and has come a long way.
Understanding & configuring Kerberos is still a challenge but even more painful & frustrating is troubleshooting a Kerberos issue. There are lot of things (small & big) that can go wrong (and will go wrong!). This talk covers the Kerberos debugging part in detail and discusses the tools & tricks that can be used to narrow down any Kerberos issue.
Rather than discussing the issues and their resolution, we will focus on how to approach a Kerberos problem and do's / dont's in Kerberos scene. This talk will provide a step by step guide that will equip the audience for troubleshooting future Kerberos problems.
Agenda is to discuss:
- Systematic approach to Kerberos troubleshooting
- Kerberos Tools available in Hadoop arsenal
- Tips & Tricks to narrow down Kerberos issues quickly
- Some nasty Kerberos issues from Support trenches
Some prior knowledge on Kerberos basics will be appreciated but is not a prerequisite.
Speaker:
Vipin Rathor, Sr. Product Specialist (HDP Security), Hortonworks
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
Apache Hive has been continuously evolving to support a broad range of use cases, bringing it beyond its batch processing roots to its current support for interactive queries with sub-second response times using LLAP. However, the development of its execution internals is not sufficient to guarantee efficient performance, since poorly optimized queries can create a bottleneck in the system. Hence, each release of Hive has included new features for its optimizer aimed to generate better plans and deliver improvements to query execution. In this talk, we present the development of the optimizer since its initial release. We describe its current state and how Hive leverages the latest Apache Calcite features to generate the most efficient execution plans. We show numbers demonstrating the improvements brought to Hive performance, and we discuss future directions for the next-generation Hive optimizer, which include an enhanced cost model, materialized views support, and complex query decorrelation.
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
Learn what the course covers, from capturing data to building a search interface; the spectrum of processing engines, Apache projects, and ecosystem tools available for converged analytics; who is best suited to attend the course and what prior knowledge you should have; and the benefits of building applications with an enterprise data hub.
LLAP (Live Long and Process) is the newest query acceleration engine for Hive 2.0, which entered GA in 2017. LLAP brings into light a new set of trade-offs and optimizations that allows for efficient and secure multi-user BI systems on the cloud. In this talk, we discuss the specifics of building a modern BI engine within those boundaries, designed to be fast and cost-effective on the public cloud. The focus of the LLAP cache is to speed up common BI query patterns on the cloud, while avoiding most of the operational administration overheads of maintaining a caching layer, with an automatically coherent cache with intelligent eviction and support for custom file formats from text to ORC, and explore the possibilities of combining the cache with a transactional storage layer which supports online UPDATE and DELETES without full data reloads. LLAP by itself, as a relational data layer, extends the same caching and security advantages to any other data processing framework. We overview the structure of such a hybrid system, where both Hive and Spark use LLAP to provide SQL query acceleration on the cloud with new, improved concurrent query support and production-ready tools and UI.
Speaker
Sergey Shelukin, Member of Technical Staff, Hortonworks
Big data processing using HPCC Systems Above and Beyond HadoopHPCC Systems
Presentation delivered at Boston Data Festival September 2015. Data-intensive computing represents a new computing paradigm to address Big Data processing requirements using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible. The fundamental challenges of data-intensive computing are managing and processing exponentially growing data volumes, significantly reducing associated data analysis cycles to support practical, timely applications, and developing new algorithms which can scale to search and process massive amounts of data. The open source HPCC (High-Performance Computing Cluster) Systems platform offers a unified approach to Big Data processing requirements: (1) a scalable, integrated computer systems hardware and software architecture designed for parallel processing of data-intensive computing applications, and (2) a new programming paradigm in the form of a high-level, declarative, data-centric programming language designed specifically for big data processing. This presentation explores the challenges of data-intensive computing from a programming perspective, and describes the ECL programming language and the HPCC architecture designed for data-intensive computing applications. HPCC is an alternative to the Hadoop platform, and ECL is compared to Pig Latin, a high-level language developed for the Hadoop MapReduce architecture.
HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems. The platform is now Open Source!
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
Apache Deep Learning 101 with Apache MXNet, Apache NiFi, MiniFi, Apache Tika, Apache Open NLP, Apache Spark, Apache Hive, Apache HBase, Apache Livy and Apache Hadoop. Using Python we run various existing models via MXNet Model Server and via Python APIs. We also use NLP for entity resolution
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.