HBase adoption continues to explode amid rapid customer success and unbridled innovation. HBase with its limitless scalability, high reliability and deep integration with Hadoop ecosystem tools, offers enterprise developers a rich platform on which to build their next generation applications. In this workshop we will explore HBase SQL capabilities, deep Hadoop ecosystem integrations and deployment & management best practices.
Learn how Hortonworks Data Flow (HDF), powered by Apache Nifi, enables organizations to harness IoAT data streams to drive business and operational insights. We will use the session to provide an overview of HDF, including detailed hands-on lab to build HDF pipelines for capture and analysis of streaming data.
Recording and labs available at:
http://hortonworks.com/partners/learn/#hdf
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
Hortonworks SmartSense provides proactive recommendations that improve cluster performance, security and operations. And since 30% of issues are configuration related, Hortonworks SmartSense makes an immediate impact on Hadoop system performance and availability, in some cases boosting hardware performance by two times. Learn how SmartSense can help you increase the efficiency of your Hadoop hardware, through customized cluster recommendations.
View the on-demand webinar: https://hortonworks.com/webinar/boosts-hadoop-hardware-performance-2x-smartsense/
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
The recently launched HDP 2.3 is a major advancement of Open Enterprise Hadoop. It represents the best of community led development with innovations spanning Apache Hadoop, Apache Ambari, Ranger, HBase, Spark and Storm. In this session we will provide an in-depth overview of new functionality and discuss it's impact on new and ongoing big data initiatives.
Webinar Series Part 5 New Features of HDF 5Hortonworks
Overview of the newest features of Hortonworks DataFlow highlighting the new processors, new user interface, edge intelligence powered by Apache MiNiFi and new support for multi-tenancy and new zero master clustering architecture
Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Apache Spark 2.3 & 2.4 has made similar strides too. In this talk, we want to highlight some of the new features and enhancements, such as:
• Apache Spark and Kubernetes
• Native Vectorized ORC and SQL Cache Readers
• Pandas UDFs for PySpark
• Continuous Stream Processing
• Barrier Execution
• Avro/Image Data Source
• Higher-order Functions
Speaker: Robert Hryniewicz, AI Evangelist, Hortonworks
Learn how Hortonworks Data Flow (HDF), powered by Apache Nifi, enables organizations to harness IoAT data streams to drive business and operational insights. We will use the session to provide an overview of HDF, including detailed hands-on lab to build HDF pipelines for capture and analysis of streaming data.
Recording and labs available at:
http://hortonworks.com/partners/learn/#hdf
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
Hortonworks SmartSense provides proactive recommendations that improve cluster performance, security and operations. And since 30% of issues are configuration related, Hortonworks SmartSense makes an immediate impact on Hadoop system performance and availability, in some cases boosting hardware performance by two times. Learn how SmartSense can help you increase the efficiency of your Hadoop hardware, through customized cluster recommendations.
View the on-demand webinar: https://hortonworks.com/webinar/boosts-hadoop-hardware-performance-2x-smartsense/
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
The recently launched HDP 2.3 is a major advancement of Open Enterprise Hadoop. It represents the best of community led development with innovations spanning Apache Hadoop, Apache Ambari, Ranger, HBase, Spark and Storm. In this session we will provide an in-depth overview of new functionality and discuss it's impact on new and ongoing big data initiatives.
Webinar Series Part 5 New Features of HDF 5Hortonworks
Overview of the newest features of Hortonworks DataFlow highlighting the new processors, new user interface, edge intelligence powered by Apache MiNiFi and new support for multi-tenancy and new zero master clustering architecture
Apache Spark 2.0 set the architectural foundations of structure in Spark, unified high-level APIs, structured streaming, and the underlying performant components like Catalyst Optimizer and Tungsten Engine. Since then the Spark community has continued to build new features and fix numerous issues in releases Spark 2.1 and 2.2.
Apache Spark 2.3 & 2.4 has made similar strides too. In this talk, we want to highlight some of the new features and enhancements, such as:
• Apache Spark and Kubernetes
• Native Vectorized ORC and SQL Cache Readers
• Pandas UDFs for PySpark
• Continuous Stream Processing
• Barrier Execution
• Avro/Image Data Source
• Higher-order Functions
Speaker: Robert Hryniewicz, AI Evangelist, Hortonworks
The Enterprise Data Lake has become the defacto repository of both structured and unstructured data within an enterprise. Being able to discover information across both structured and unstructured data using search is a key capability of enterprise data lake. In this workshop, we will provide an in-depth overview of HDP Search with focus on configuration, sizing and tuning. We will also deliver a working example to showcase the usage of HDP Search along with the rest of platform capabilities to deliver real world solution.
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Hortonworks technical workshop operations with ambariHortonworks
Ambari continues on its journey of provisioning, monitoring and managing enterprise Hadoop deployments. With 2.0, Apache Ambari brings a host of new capabilities including updated metric collections; Kerberos setup automation and developer views for Big Data developers. In this Hortonworks Technical Workshop session we will provide an in-depth look into Apache Ambari 2.0 and showcase security setup automation using Ambari 2.0. View the recording at https://www.brighttalk.com/webcast/9573/155575. View the github demo work at https://github.com/abajwa-hw/ambari-workshops/blob/master/blueprints-demo-security.md. Recorded May 28, 2015.
Apache Hive is an Enterprise Data Warehouse build on top of Hadoop. Hive supports Insert/Update/Delete SQL statements with transactional semantics and read operations that run at Snapshot Isolation. This talk will describe the intended use cases, architecture of the implementation, new features such as SQL Merge statement and recent improvements. The talk will also cover Streaming Ingest API, which allows writing batches of events into a Hive table without using SQL. This API is used by Apache NiFi, Storm and Flume to stream data directly into Hive tables and make it visible to readers in near real time.
Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop and Hortonworks Data Platform (HDP). Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers views to the end user. In this session Hortonworks will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.
This presentation was created as an introduction to the Apache NiFi project; to be followed by “Lab 0” of the “Realtime Event Processing in Hadoop with NiFi, Kafka and Storm” tutorial hosted here: http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm/#section_1
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. The Ambari engineering team will talk about improvements being made to the automation, metrics, logging, upgrade, and other core frameworks within Ambari as the project is being re-imagined.
Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
As part of this talk, the engineering team will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari is being re-imagined to handle these new challenges.
Speaker
Paul Codding, Product Management Director, Hortonworks
Oliver Szabo, Senior Software Engineer, Hortonworks
Apache Ambari is now the preferred way of provisioning, managing and monitoring Hadoop Clusters. Ambari helps users to manage Hadoop clusters simplifying actions such as upgrades, configuration management, service management, etc. From release 2.0, Ambari started supporting automated Rolling Upgrades. This was further enhanced with release 2.2.0.0 to include support for Express Upgrades, which allows users to upgrade large scale clusters faster but requiring cluster downtime.
This talk will cover planning and execution of Hadoop cluster upgrades from an operational perspective. The talk will also cover the internals of the upgrade process including the various stages such as pre-upgrade, backup, service checks, configuration upgrades, and finalization. Finally, the talk will cover troubleshooting upgrade failures, monitoring services during upgrades and post upgrade actions. The presentation will conclude with a case study that will cover how the upgrade process works on a large cluster (including aspects such as planning the upgrade, the amount of time required for the various stages, and troubleshooting)
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks
How To: Hortonworks DataFlow 2.0 with Ambari and Ranger for integrated installation, deployment and operations of Apache NiFi.
On demand webinar with demo: http://hortonworks.com/webinar/getting-goal-big-data-faster-enterprise-readiness-data-motion/
This talk will give an overview of two exciting releases for Apache HBase 2.0 and Phoenix 5.0. HBase provides a NoSQL column store on Hadoop for random, real-time read/write workloads. Phoenix provides SQL on top of HBase. HBase 2.0 contains a large number of features that were a long time in development, including rewritten region assignment, performance improvements (RPC, rewritten write pipeline, etc), async clients and WAL, a C++ client, offheaping memstore and other buffers, shading of dependencies, as well as a lot of other fixes and stability improvements. We will go into details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Phoenix 5.0 is the next big Phoenix release because of its integration with HBase 2.0 and a lot of performance improvements in support of secondary Indexes. It has many important new features such as encoded columns, Kafka and Hive integration, and many other performance improvements. This session will also describe the uses cases that HBase and Phoenix are a good architectural fit for.
Speaker: Alan Gates, Co-Founder, Hortonworks
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains, including significantly improved performance for ACID tables. The talk will also provide a glimpse of what is expected to come in the near future.
Speaker: Alan Gates, Co-Founder, Hortonworks
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
In my talk I will discuss and show examples of using Apache Hadoop, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications.
As part of my talk I will walk through using Apache NXNet Pre-Built Models, MXNet's New Model Server with Apache NiFi, executing MXNet with Apache NiFi and running Apache MXNet on edge nodes utilizing Python and Apache MiniFi.
This talk is geared towards Data Engineers interested in the basics of Deep Learning with open source Apache tools in a Big Data environment. I will walk through source code examples available in github and run the code live on an Apache Hadoop / YARN / Apache Spark cluster.
This will be an introduction to executing Deep Learning Pipelines in an Apache Big Data environment.
My talk at Data Works Summit Sydney was listed in top 7 -> https://hortonworks.com/blog/7-sessions-dataworks-summit-sydney-see/
Also have speak at and run Future of Data Princeton and at Oracle Code NYC.
Ref:
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
https://dzone.com/refcardz/introduction-to-tensorflow
Speaker
Timothy Spann, Solutions Engineer, Hortonworks
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
Running scheduled, long-running or repetitive workflows on Hadoop clusters, especially secure clusters, is the domain of Apache Oozie. Oozie, however, suffers from XML for job configuration and a dated UI -- very bad usability in all. Apache Ambari, in its quest to make cluster management easier, has branched out to offering views for user services. This talk covers the Ambari Workflow Manager view which provides a GUI to author and visualize Oozie jobs.
To provide an example of Workflow Manager, Oozie jobs for log management and HBase compactions will be demonstrated showing off how easy Oozie can now be and what the exciting future for Oozie and Workflow Manager holds.
Apache Oozie is the long-time incumbent in big data processing. It is known to be hard to use and the interface is not aesthetically pleasing -- Oozie suffers from a dated UI. However, for secure Hadoop clusters, Oozie is the most readily available, obvious and full featured solution.
Apache Ambari is a deployment and configuration management tool used to deploy Hadoop clusters. Ambari Workflow Manager is a new Ambari view that helps address the usability and UI appeal of Apache Oozie.
In this talk, we’re going to leverage the stable foundation of Apache Oozie and clarity of Workflow Manager to demonstrate how one can build powerful batch workflows on top of Apache Hadoop. We’re also going to cover future roadmap and vision for both Apache Oozie and Workflow Manager. We will finish off with a live demo of Workflow Manager in action.
Speakers: Artem Ervits, Solutions Engineer, Hortonworks and Clay Baenziger, Hadoop Infrastructure, Bloomberg
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
http://hortonworks.com/hadoop/spark/
Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=03debab5ba04b34a033dc5c2f03c7967
As the ratio of memory to processing power rapidly evolves, many within the Hadoop community are gravitating towards Apache Spark for fast, in-memory data processing. And with YARN, they use Spark for machine learning and data science use cases along side other workloads simultaneously. This is a continuation of our YARN Ready Series, aimed at helping developers learn the different ways to integrate to YARN and Hadoop. Tools and applications that are YARN Ready have been verified to work within YARN.
The Enterprise Data Lake has become the defacto repository of both structured and unstructured data within an enterprise. Being able to discover information across both structured and unstructured data using search is a key capability of enterprise data lake. In this workshop, we will provide an in-depth overview of HDP Search with focus on configuration, sizing and tuning. We will also deliver a working example to showcase the usage of HDP Search along with the rest of platform capabilities to deliver real world solution.
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Hortonworks technical workshop operations with ambariHortonworks
Ambari continues on its journey of provisioning, monitoring and managing enterprise Hadoop deployments. With 2.0, Apache Ambari brings a host of new capabilities including updated metric collections; Kerberos setup automation and developer views for Big Data developers. In this Hortonworks Technical Workshop session we will provide an in-depth look into Apache Ambari 2.0 and showcase security setup automation using Ambari 2.0. View the recording at https://www.brighttalk.com/webcast/9573/155575. View the github demo work at https://github.com/abajwa-hw/ambari-workshops/blob/master/blueprints-demo-security.md. Recorded May 28, 2015.
Apache Hive is an Enterprise Data Warehouse build on top of Hadoop. Hive supports Insert/Update/Delete SQL statements with transactional semantics and read operations that run at Snapshot Isolation. This talk will describe the intended use cases, architecture of the implementation, new features such as SQL Merge statement and recent improvements. The talk will also cover Streaming Ingest API, which allows writing batches of events into a Hive table without using SQL. This API is used by Apache NiFi, Storm and Flume to stream data directly into Hive tables and make it visible to readers in near real time.
Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop and Hortonworks Data Platform (HDP). Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management platform that can automate cluster provisioning, deploy 3rd party software and provide custom operational and developers views to the end user. In this session Hortonworks will cover 3 key integration points of Apache Ambari including Stacks, Views and Blueprints and deliver working examples of each.
This presentation was created as an introduction to the Apache NiFi project; to be followed by “Lab 0” of the “Realtime Event Processing in Hadoop with NiFi, Kafka and Storm” tutorial hosted here: http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm/#section_1
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. The Ambari engineering team will talk about improvements being made to the automation, metrics, logging, upgrade, and other core frameworks within Ambari as the project is being re-imagined.
Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
As part of this talk, the engineering team will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari is being re-imagined to handle these new challenges.
Speaker
Paul Codding, Product Management Director, Hortonworks
Oliver Szabo, Senior Software Engineer, Hortonworks
Apache Ambari is now the preferred way of provisioning, managing and monitoring Hadoop Clusters. Ambari helps users to manage Hadoop clusters simplifying actions such as upgrades, configuration management, service management, etc. From release 2.0, Ambari started supporting automated Rolling Upgrades. This was further enhanced with release 2.2.0.0 to include support for Express Upgrades, which allows users to upgrade large scale clusters faster but requiring cluster downtime.
This talk will cover planning and execution of Hadoop cluster upgrades from an operational perspective. The talk will also cover the internals of the upgrade process including the various stages such as pre-upgrade, backup, service checks, configuration upgrades, and finalization. Finally, the talk will cover troubleshooting upgrade failures, monitoring services during upgrades and post upgrade actions. The presentation will conclude with a case study that will cover how the upgrade process works on a large cluster (including aspects such as planning the upgrade, the amount of time required for the various stages, and troubleshooting)
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks
How To: Hortonworks DataFlow 2.0 with Ambari and Ranger for integrated installation, deployment and operations of Apache NiFi.
On demand webinar with demo: http://hortonworks.com/webinar/getting-goal-big-data-faster-enterprise-readiness-data-motion/
This talk will give an overview of two exciting releases for Apache HBase 2.0 and Phoenix 5.0. HBase provides a NoSQL column store on Hadoop for random, real-time read/write workloads. Phoenix provides SQL on top of HBase. HBase 2.0 contains a large number of features that were a long time in development, including rewritten region assignment, performance improvements (RPC, rewritten write pipeline, etc), async clients and WAL, a C++ client, offheaping memstore and other buffers, shading of dependencies, as well as a lot of other fixes and stability improvements. We will go into details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Phoenix 5.0 is the next big Phoenix release because of its integration with HBase 2.0 and a lot of performance improvements in support of secondary Indexes. It has many important new features such as encoded columns, Kafka and Hive integration, and many other performance improvements. This session will also describe the uses cases that HBase and Phoenix are a good architectural fit for.
Speaker: Alan Gates, Co-Founder, Hortonworks
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains, including significantly improved performance for ACID tables. The talk will also provide a glimpse of what is expected to come in the near future.
Speaker: Alan Gates, Co-Founder, Hortonworks
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
In my talk I will discuss and show examples of using Apache Hadoop, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications.
As part of my talk I will walk through using Apache NXNet Pre-Built Models, MXNet's New Model Server with Apache NiFi, executing MXNet with Apache NiFi and running Apache MXNet on edge nodes utilizing Python and Apache MiniFi.
This talk is geared towards Data Engineers interested in the basics of Deep Learning with open source Apache tools in a Big Data environment. I will walk through source code examples available in github and run the code live on an Apache Hadoop / YARN / Apache Spark cluster.
This will be an introduction to executing Deep Learning Pipelines in an Apache Big Data environment.
My talk at Data Works Summit Sydney was listed in top 7 -> https://hortonworks.com/blog/7-sessions-dataworks-summit-sydney-see/
Also have speak at and run Future of Data Princeton and at Oracle Code NYC.
Ref:
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
https://dzone.com/refcardz/introduction-to-tensorflow
Speaker
Timothy Spann, Solutions Engineer, Hortonworks
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
Running scheduled, long-running or repetitive workflows on Hadoop clusters, especially secure clusters, is the domain of Apache Oozie. Oozie, however, suffers from XML for job configuration and a dated UI -- very bad usability in all. Apache Ambari, in its quest to make cluster management easier, has branched out to offering views for user services. This talk covers the Ambari Workflow Manager view which provides a GUI to author and visualize Oozie jobs.
To provide an example of Workflow Manager, Oozie jobs for log management and HBase compactions will be demonstrated showing off how easy Oozie can now be and what the exciting future for Oozie and Workflow Manager holds.
Apache Oozie is the long-time incumbent in big data processing. It is known to be hard to use and the interface is not aesthetically pleasing -- Oozie suffers from a dated UI. However, for secure Hadoop clusters, Oozie is the most readily available, obvious and full featured solution.
Apache Ambari is a deployment and configuration management tool used to deploy Hadoop clusters. Ambari Workflow Manager is a new Ambari view that helps address the usability and UI appeal of Apache Oozie.
In this talk, we’re going to leverage the stable foundation of Apache Oozie and clarity of Workflow Manager to demonstrate how one can build powerful batch workflows on top of Apache Hadoop. We’re also going to cover future roadmap and vision for both Apache Oozie and Workflow Manager. We will finish off with a live demo of Workflow Manager in action.
Speakers: Artem Ervits, Solutions Engineer, Hortonworks and Clay Baenziger, Hadoop Infrastructure, Bloomberg
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
http://hortonworks.com/hadoop/spark/
Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=03debab5ba04b34a033dc5c2f03c7967
As the ratio of memory to processing power rapidly evolves, many within the Hadoop community are gravitating towards Apache Spark for fast, in-memory data processing. And with YARN, they use Spark for machine learning and data science use cases along side other workloads simultaneously. This is a continuation of our YARN Ready Series, aimed at helping developers learn the different ways to integrate to YARN and Hadoop. Tools and applications that are YARN Ready have been verified to work within YARN.
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
Scalding is a scala DSL for Cascading. Run on Hadoop, it’s a concise, functional, and very efficient way to build big data applications. One significant benefit of Scalding is that it allows easy porting of Scalding apps from MapReduce to newer, faster execution fabrics.
In this webinar, Cyrille Chépélov, of Transparency Rights Management, will share how his organization boosted the performance of their Scalding apps by over 50% by moving away from MapReduce to Cascading 3.0 on Apache Tez. Dhruv Kumar, Hortonworks Partner Solution Engineer, will then explain how you can interact with data on HDP using Scala and leverage Scala as a programming language to develop Big Data applications.
Slides from joint webinar. Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your Data Science efforts.
Together, Pivotal HAWQ and the Hortonworks Data Platform provide businesses with a Modern Data Architecture for IT transformation.
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
This webinar focuses on introducing Scalding for developers and writing applications for Hadoop and YARN using Scalding. Guest speaker Jonathan Coveney from Twitter provides an overview, use cases, limitations, and core concepts.
Apache Ambari: Managing Hadoop and YARNHortonworks
Part of the Hortonworks YARN Ready Webinar Series, this session is about management of Apache Hadoop and YARN using Apache Ambari. This series targets developers and we will feature a demo on Ambari.
As Hadoop becomes the defacto big data platform, enterprises deploy HDP across wide range of physical and virtual environments spanning private and public clouds. This session will cover key considerations for cloud deployment and showcase Cloudbreak for simple and consistent deployment across cloud providers of choice.
Hortonworks tech workshop in-memory processing with sparkHortonworks
Apache Spark offers unique in-memory capabilities and is well suited to a wide variety of data processing workloads including machine learning and micro-batch processing. With HDP 2.2, Apache Spark is a fully supported component of the Hortonworks Data Platform. In this session we will cover the key fundamentals of Apache Spark and operational best practices for executing Spark jobs along with the rest of Big Data workloads. We will also provide a working example to showcase micro-batch and machine learning processing using Apache Spark.
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
Hortonworks Data Platform is a key component of Modern Data Architecture. Organizations rely on HDP for mission critical business functions and expects for the system to be constantly available and performant. In this session we will cover the operational best practices for administering the Hortonworks Data Platform including the initial setup and ongoing maintenance.
Hortonworks Technical Workshop - build a yarn ready application with apache ...Hortonworks
YARN has fundamentally transformed the Hadoop landscape. It has opened hadoop from a single workload system to one that can now support a multitude of fit for purpose processing. In this workshop we will provide an overview of Apache Slider that enables custom applications to run natively in the cluster as a YARN Ready Application. The workshop will include working examples and provide an overview of work being pursued in the community around YARN Docker integration.
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
Beginning with HDP 2.1, Hortonworks Data Platform ships with Apache Falcon for Hadoop data governance. Himanshu Bari, Hortonworks senior product manager, and Venkatesh Seetharam, Hortonworks co-founder and committer to Apache Falcon, lead this 30-minute webinar, including:
+ Why you need Apache Falcon
+ Key new Falcon features
+ Demo: Defining data pipelines with replication; policies for retention and late data arrival; managing Falcon server with Ambari
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopHortonworks
Rich media is exploding all around us. From our personal usage to retailers monitoring store traffic for optimized associate placement, there is wide and growing application of rich media. Despite the pervasive usage, enterprises have had limited choice of generally available tools to analyze rich media. In this session we will look into leveraging deep learning algorithms for rich media analysis and provide practical hands on example of image recognition using Apache Hadoop and Spark.
Beyond SQL: Speeding up Spark with DataFramesDatabricks
In this talk I describe how you can use Spark SQL DataFrames to speed up Spark programs, even without writing any SQL. By writing programs using the new DataFrame API you can write less code, read less data and let the optimizer do the hard work.
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropeFlip Kromer
This talk centers on two things: a set of patterns for the architecture of high-scale data systems; and a framework for understanding the tradeoffs we make in designing them.
Just because you can, doesn’t mean you should. But in this case, you definitely should! Learn how this one weird trick (Jinja templating) will supercharge your analytics workflows and help you do more, better, faster with SQL.
The iOS team at The Washington Post needed to grow quickly and maintain sanity (and shipping quality). Here are some of the key workflow tools that we used.
Framing the Argument: How to Scale Faster with NoSQLInside Analysis
The Briefing Room with Dr. Robin Bloor and IBM Cloudant
Live Webcast March 24, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e8bf62408d47e76c43aa73be08377e41c
Context matters. Perspective matters. Thinking outside the box? That's often the key! While the Structured Query Language remains the lingua Franca of data, there are some views of the world that are best rendered with the benefit of NoSQL engines. As usual, that's easier said than done. How can your organization migrate from a structured query to unstructured or semi-structured query language?
Register for this episode of The Briefing Room to find out! Veteran Analyst Dr. Robin Bloor will provide a detailed assessment of serious considerations when using NoSQL engines in conjunction with SQL. He'll be briefed by Ryan Millay of IBM Cloudant, who will showcase his company's solution, and how it's addressing the more vexing challenges facing today's information managers.
Visit InsideAnalysis.com for more information.
[System design] Design a tweeter-like systemAree Oh
Presented in Simple step system design study.
As an interviewer, what should I do during system design interviews?
There are five steps.
Understanding requirements of a system is the first step that every interviewer should do.
Next step is to design interface and data model.
Based on the interface and requirements, the interviewer draws an initial system design.
Now, I the interviewer should think how to improve system qualities by asking right questions (scalability, availability, load balancing, caching, …)
At the end of the interview, it should have the final architecture of the system.
This presentation will go through the above steps with example scenario: design a tweeter-like system.
Alternative microservices - one size doesn't fit allJeppe Cramon
A different look at microservices. Is there only one way to approach microservices, what are the pit falls, what protocols and communication patterns should we use. How do we discover service boundaries? What role does applications and the UI play?
Developing Connections Plug-ins and applications is full of "What the??" moments, from what browser technologies and versions are supported through to common functions working in different ways in different parts of Connections, any of these can put a real dent in your delivery date but most are easy to cure and avoid with a little bit of hindsight and knowledge, here is that knowledge for you to take home and help you deliver on time.
The Amino Analytical Framework - Leveraging Accumulo to the Fullest Donald Miner
Speaker: Steve Touw, CTO, 42six Solutions a CSC Company
Amino is an open source analytical framework that focuses on a “building-blocks” approach to data discovery by pre-computing features about data at the most granular level possible and then allows analysts and data scientists to easily combine those features into more complex questions.
The magic behind Amino is found in it’s custom Accumulo index; that index strives to provide fast scans, highly dimensional scans, data compression, and a simple query structure. The index leverages Accumulo iterators to do much of the scan time logic which has no limit on dimensionality of the query. Iterators are what makes Accumulo unique and enables the Amino index to execute the complex queries.
Marjorie M. K. Hlava, President, Chair of the Board, and Chief Scientist, Access Innovations, Inc.
During this annual highlight of the DHUG meetings, Margie will discuss the exciting new changes and additions to the Data Harmony software. She will be joined by some members of our software development team to talk about specific initiatives we have worked on over the past year.
Modern Web technologies (and why you should care): Megacomm, Jerusalem, Febru...Reuven Lerner
My talk from the Megacomm 2012 conference in Jerusalem, on February 16th, 2012. I describe the fundamental underpinnings of the Web, how things have changed on both the browser and server sides, and what these technologies mean for users..
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
The HDF 3.3 release delivers several exciting enhancements and new features. But, the most noteworthy of them is the addition of support for Kafka 2.0 and Kafka Streams.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-3-taking-stream-processing-next-level/
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
Forrester forecasts* that direct spending on the Internet of Things (IoT) will exceed $400 Billion by 2023. From manufacturing and utilities, to oil & gas and transportation, IoT improves visibility, reduces downtime, and creates opportunities for entirely new business models.
But successful IoT implementations require far more than simply connecting sensors to a network. The data generated by these devices must be collected, aggregated, cleaned, processed, interpreted, understood, and used. Data-driven decisions and actions must be taken, without which an IoT implementation is bound to fail.
https://hortonworks.com/webinar/iot-predictions-2019-beyond-data-heart-iot-strategy/
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
Cloudbreak, a part of Hortonworks Data Platform (HDP), simplifies the provisioning and cluster management within any cloud environment to help your business toward its path to a hybrid cloud architecture.
https://hortonworks.com/webinar/getting-data-cloud-cloudbreak-live-demo/
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
In this webinar, we talk with experts from Johns Hopkins as they share techniques and lessons learned in real-world Apache Hadoop implementation.
https://hortonworks.com/webinar/johns-hopkins-using-hadoop-securely-access-log-events/
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
Cybersecurity today is a big data problem. There’s a ton of data landing on you faster than you can load, let alone search it. In order to make sense of it, we need to act on data-in-motion, use both machine learning, and the most advanced pattern recognition system on the planet: your SOC analysts. Advanced visualization makes your analysts more efficient, helps them find the hidden gems, or bombs in masses of logs and packets.
https://hortonworks.com/webinar/catch-hacker-real-time-live-visuals-bots-bad-guys/
We have introduced several new features as well as delivered some significant updates to keep the platform tightly integrated and compatible with HDP 3.0.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-2-release-raises-bar-operational-efficiency/
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
With the growth of Apache Kafka adoption in all major streaming initiatives across large organizations, the operational and visibility challenges associated with Kafka are on the rise as well. Kafka users want better visibility in understanding what is going on in the clusters as well as within the stream flows across producers, topics, brokers, and consumers.
With no tools in the market that readily address the challenges of the Kafka Ops teams, the development teams, and the security/governance teams, Hortonworks Streams Messaging Manager is a game-changer.
https://hortonworks.com/webinar/curing-kafka-blindness-hortonworks-streams-messaging-manager/
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
The healthcare industry—with its huge volumes of big data—is ripe for the application of analytics and machine learning. In this webinar, Hortonworks and Quanam present a tool that uses machine learning and natural language processing in the clinical classification of genomic variants to help identify mutations and determine clinical significance.
Watch the webinar: https://hortonworks.com/webinar/interpretation-tool-genomic-sequencing-data-clinical-environments/
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
Last year IBM and Hortonworks jointly announced a strategic and deep partnership. Join us as we take a close look at the partnership accomplishments and the conjoined road ahead with industry-leading analytics offers.
View the webinar here: https://hortonworks.com/webinar/ibmhortonworks-transformation-big-data-landscape/
In this exclusive Premier Inside Out, you will hear from Druid committer Slim Bouguerra, Staff Software Engineer and Product Manager Will Xu. These Hortonworkers will explain the vision of these components, review new features, share some best practices and answer your questions.
View the webinar here: https://hortonworks.com/webinar/hortonworks-premier-apache-druid/
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
Gaining business advantages from big data is moving beyond just the efficient storage and deep analytics on diverse data sources to using AI methods and analytics on streaming data to catch insights and take action at the edge of the network.
https://hortonworks.com/webinar/accelerating-data-science-real-time-analytics-scale/
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
Thanks to sensors and the Internet of Things, industrial processes now generate a sea of data. But are you plumbing its depths to find the insight it contains, or are you just drowning in it? Now, Hortonworks and Seeq team to bring advanced analytics and machine learning to time-series data from manufacturing and industrial processes.
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
Trimble Transportation Enterprise is a leading provider of enterprise software to over 2,000 transportation and logistics companies. They have designed an architecture that leverages Hortonworks Big Data solutions and Machine Learning models to power up multiple Blockchains, which improves operational efficiency, cuts down costs and enables building strategic partnerships.
https://hortonworks.com/webinar/blockchain-with-machine-learning-powered-by-big-data-trimble-transportation-enterprise/
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
For years, the healthcare industry has had problems of data scarcity and latency. Clearsense solved the problem by building an open-source Hortonworks Data Platform (HDP) solution while providing decades worth of clinical expertise. Clearsense is delivering smart, real-time streaming data, to its healthcare customers enabling mission-critical data to feed clinical decisions.
https://hortonworks.com/webinar/delivering-smart-real-time-streaming-data-healthcare-customers-clearsense/
Making Enterprise Big Data Small with EaseHortonworks
Every division in an organization builds its own database to keep track of its business. When the organization becomes big, those individual databases grow as well. The data from each database may become silo-ed and have no idea about the data in the other database.
https://hortonworks.com/webinar/making-enterprise-big-data-small-ease/
Driving Digital Transformation Through Global Data ManagementHortonworks
Using your data smarter and faster than your peers could be the difference between dominating your market and merely surviving. Organizations are investing in IoT, big data, and data science to drive better customer experience and create new products, yet these projects often stall in ideation phase to a lack of global data management processes and technologies. Your new data architecture may be taking shape around you, but your goal of globally managing, governing, and securing your data across a hybrid, multi-cloud landscape can remain elusive. Learn how industry leaders are developing their global data management strategy to drive innovation and ROI.
Presented at Gartner Data and Analytics Summit
Speaker:
Dinesh Chandrasekhar
Director of Product Marketing, Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
Hortonworks DataFlow (HDF) is the complete solution that addresses the most complex streaming architectures of today’s enterprises. More than 20 billion IoT devices are active on the planet today and thousands of use cases across IIOT, Healthcare and Manufacturing warrant capturing data-in-motion and delivering actionable intelligence right NOW. “Data decay” happens in a matter of seconds in today’s digital enterprises.
To meet all the needs of such fast-moving businesses, we have made significant enhancements and new streaming features in HDF 3.1.
https://hortonworks.com/webinar/series-hdf-3-1-technical-deep-dive-new-streaming-features/
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
Join the Hortonworks product team as they introduce HDF 3.1 and the core components for a modern data architecture to support stream processing and analytics.
You will learn about the three main themes that HDF addresses:
Developer productivity
Operational efficiency
Platform interoperability
https://hortonworks.com/webinar/series-hdf-3-1-redefining-data-motion-modern-data-architectures/
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. It provides an end-to-end platform that can collect, curate, analyze, and act on data in real-time, on-premises, or in the cloud with a drag-and-drop visual interface. It’s being used across industries on large amounts of data that had stored in isolation which made collaboration and analysis difficult.
Join industry experts from Hortonworks and Attunity as they explain how Apache NiFi and streaming CDC technology provides a distributed, resilient platform for unlocking the value of data in new ways.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Apache HBase is a NoSQL database built natively on Hadoop and HDFS.
HBase scales horizontally, so you can store and manage huge datasets with great performance and low cost.
HBase caches hot data in memory so data access happens in milliseconds.
HBase offers a flexible schema, you decide your schema on reads or writes, so HBase is great for dealing with messy and multistructured data.
HBase SQL and NoSQL APIs, NoSQL using HBase's native NoSQL interface or Apache Phoenix, a SQL interface that runs on top of HBase.
Finally, because HBase is native to Hadoop, data in HBase can be processed in MapReduce, Tez or any of the dozens of other tools in the Hadoop analytics world.
HBase is used by some of the biggest web companies, like Facebook who use it for their Messages and Nearyby Friends features, and eBay who use search indexing.
If you're new to HBase and want to learn more, check out hortonworks.com/hadoop/hbase to find out more.
See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
Table == Sorted map of maps (like a OrderedDictionary, TreeMap. It’s all just bytes!)
Access by coordinates: rowkey, column family, column qualifier, timestamp
Basic KV operations: GET, PUT, DELETE
Complex query: SCAN over rowkey range (remember, ordered rowkeys. *this* is schema design)
INCREMENT, APPEND, CheckAnd{Put,Delete} (server-side atomic. Requires a lock; can be contentious)
NO: secondary indices, joins, multi-row transactions
Column-Family oriented.
See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
Table == Sorted map of maps (like a OrderedDictionary, TreeMap. It’s all just bytes!)
Access by coordinates: rowkey, column family, column qualifier, timestamp
Basic KV operations: GET, PUT, DELETE
Complex query: SCAN over rowkey range (remember, ordered rowkeys. *this* is schema design)
INCREMENT, APPEND, CheckAnd{Put,Delete} (server-side atomic. Requires a lock; can be contentious)
NO: secondary indices, joins, multi-row transactions
Column-Family oriented.
See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
Records ordered by rowkey (write-side sort, application feature)
Continuous sequences of rows partitioned into Regions
Regions automatically distributed around the cluster ((mostly) hands-free partition management)
Regions automatically split when they grow too large (split by size (bytes), on row boundary)
Records ordered by rowkey (write-side sort, application feature)
Continuous sequences of rows partitioned into Regions
Regions automatically distributed around the cluster ((mostly) hands-free partition management)
Regions automatically split when they grow too large (split by size (bytes), on row boundary)
To start off we'll talk about how HBase High Availability has gotten substantially better over the past 18 months.
From the beginning, HBase offered 2 levels of protection to ensure high availability.
First, HBase partitions data across multiple nodes, making each node responsible for ranges of the over dataset held within HBase. Before HBase HA, if you lose a node you only lose access to the data on that node, all other data in the database could still be read and written. This is indicated with point (1) here.
Second, HBase stores all its data in HDFS so that data is highly available and if a node is truly lost, all HBase needs to do is spend a few minutes recovering that data on one of the remaining nodes. That's indicated with point (2).
But what happens during that recovery process? During the few minutes it takes to recover, data in that node can't be read or written, it's unavailable. For many apps this situation is ok, a lot of HBase production applications have managed to meet 99.9% uptimes with this system.
But some applications need better HA guarantees, which led to HBase HA.
HBase HA adds a 3rd layer of protection by replicating data to multiple regionservers in the cluster.
With HBase HA you have primary regionservers and standby regionservers, each key range is held on more than one server so even if you lose a single server all its data is still available for read.
HBase HA uses an HA model called timeline consistent read replicas.
With HBase HA all writes are still handled exclusively by the primary, so you still get strong consistency for updates and operations like increments.
Replication is done asynchronously so data in standby regionservers may be stale relative to data in primary. Usually the data will agree in less than a second but if the system is busy the replicas could lag the primary by several seconds.
HBase clients now have the ability to decide if they need strong consistency or if they are willing to sacrifice strong consistency on reads for better availability. This can be done on a per get or per scan basis.
A lot of HBase applications are read heavy and with HBase HA it's straightforward to achieve 4 9s availability for these sorts of applications. Overall HBase HA is a great addition for any mission critical apps on Hadoop.