An update of the "Hadoop and Kerberos: the Madness Beyond the Gate" talk, covering recent work "the Fix Kerberos" JIRA and its first deliverable: KDiag
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
Securing Hadoop's REST APIs with Apache Knox Gateway
Presented at Hadoop Summit on June 6th, 2014
Describes the overall roles the Apache Knox Gateway plays in Hadoop security and briefly covers its primary features.
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
Setting up a secure Hadoop cluster involves a magic combination of Kerberos, Sentry, Ranger, Knox, Atlas, LDAP and possibly PAM. Add encryption on the wire and at rest to the mix and you have, at the very least, a interesting configuration and installation task.
Nonetheless, the fact that there are a lot of knobs to turn, doesn't excuse you from the responsibility of taking proper care of your customers' data. In this talk, we'll detail how the different security components in Hadoop interact and how easy it actually can be to setup thing correctly, once you understand the concepts and tools. We'll outline a successful secure Hadoop setup with an example.
As Hadoop becomes a critical part of Enterprise data infrastructure, securing Hadoop has become critically important. Enterprises want assurance that all their data is protected and that only authorized users have access to the relevant bits of information. In this session we will cover all aspects of Hadoop security including authentication, authorization, audit and data protection. We will also provide demonstration and detailed instructions for implementing comprehensive Hadoop security.
An overview of the development of the Apache Hadoop software stack, including some of the barriers to participation -and how and why to overcome them. It closes with some open discussion points/ideas of how the existing process can be improved.
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
Securing Hadoop's REST APIs with Apache Knox Gateway
Presented at Hadoop Summit on June 6th, 2014
Describes the overall roles the Apache Knox Gateway plays in Hadoop security and briefly covers its primary features.
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
Setting up a secure Hadoop cluster involves a magic combination of Kerberos, Sentry, Ranger, Knox, Atlas, LDAP and possibly PAM. Add encryption on the wire and at rest to the mix and you have, at the very least, a interesting configuration and installation task.
Nonetheless, the fact that there are a lot of knobs to turn, doesn't excuse you from the responsibility of taking proper care of your customers' data. In this talk, we'll detail how the different security components in Hadoop interact and how easy it actually can be to setup thing correctly, once you understand the concepts and tools. We'll outline a successful secure Hadoop setup with an example.
As Hadoop becomes a critical part of Enterprise data infrastructure, securing Hadoop has become critically important. Enterprises want assurance that all their data is protected and that only authorized users have access to the relevant bits of information. In this session we will cover all aspects of Hadoop security including authentication, authorization, audit and data protection. We will also provide demonstration and detailed instructions for implementing comprehensive Hadoop security.
An overview of the development of the Apache Hadoop software stack, including some of the barriers to participation -and how and why to overcome them. It closes with some open discussion points/ideas of how the existing process can be improved.
A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.
An overview of securing Hadoop. Content primarily by Balaji Ganesan, one of the leaders of the Apache Argus project. Presented on Sept 4, 2014 at the Toronto Hadoop User Group by Adam Muise.
Data in Hadoop is getting bigger every day, consumers of the data are growing, organizations are now looking at making their Hadoop cluster compliant to federal regulations and commercial demands. Apache Ranger simplifies the management of security policies across all components in Hadoop. Ranger provides granular access controls to data.
The deck describes what security tools are available in Hadoop and their purpose then it moves on to discuss in detail Apache Ranger.
Abstract:
As organizations start to roll out or migrate data driven applications to Apache Hadoop, there are times when they have conflicting needs to leverage their full co-mingled data sets in Hadoop
while providing isolation of sections of such co-mingled data to a specific customer. Serving multiple customers in this manner is a typical multi-tenant usecase and one that can be challenging in Apache Hadoop.
This presentation walks through a number of patterns that can be leveraged for providing isolation of tenants based on the composability of Apache Knox for:
* Authentication/Federation Providers
* KnoxSSO
* Identity Assertion
* Tenant specific topologies
With these patterns, Knox can provide an infrastructure for robust tenant isolation and access control for application UIs and REST APIs for your data landscape, when suitably coupled with a cluster that has carefully considered infrastructure including:
* Kerberos
* Tenant specific user accounts, OUs and Groups within LDAP
* Authorization Policy that is aware of the tenant specific groups,
Summary:
We will walk through some of the patterns that have been used to enable such a multi-tenant environment as well as the specific considerations for topology, access control and user accounts involved with creating such an environment.
Redis for Security Data : SecurityScorecard JVM Redis UsageTimothy Spann
A quick talk about Java, Scala, Spring XD and Spring Data Redis against Redis.
An example of how we are using Redis at SecurityScorecard for security data and some hands-on development in Java and Scala.
Overview of Hadoop security (revise from presentation in Hadoop in Taiwan, 2012). Detail configuration of security infrastructure leveraging kerberos and also extensive integration with LDAP aiming for fast exchange of cluster information. Introduction also Etu Appliance end of the slide.
While traditional on-prem systems have always been a target from internal and external attackers, recent times have seen increased attacks on Hadoop cloud deployments. Hadoop systems are going to be increasingly targeted due to the large volume of data that it stores. Many Hadoop installations on cloud are publicly accessible without any security measures which pose threat to exfiltration of large datasets and possibly crypto-mining on this infrastructure with its huge distributed compute capability.
Apache Knox provides multiple layers of security related to authentication, service-level authorization and web application security controls out of the box for multiple Hadoop components.
Apache Knox provides configuration to prevent common OWASP Top 10 security risks e.g. Cross-site Request Forgery (CSRF), Cross Site Scripting (XSS), MIME Content Type sniffing, Clickjacking, etc. We will also discuss controls like HTTP Strict Transport Security which prevents SSL Downgrade attacks and CORS filter for allowing applications to make cross domain requests only to specifically allowed hosts through XHR. Support to include/exclude Cipher suites and exclude SSL protocols enables compliance with hardening guidelines provided by CIS for application servers.
Knox has several supported authentication mechanisms with Kerberos underneath e.g. LDAP over SSL, AD, PAM based auth for Unix users, integration with Identity Providers like Okta, etc. Also, capabilities like Trusted Proxy, Single Sign-On auth, Hostmap Provider, Identity Assertion Provider, Client Authentication enhances the overall security posture.
We will also cover the typical kill-chain methodology tailored to Hadoop ecosystem which will help formulate the preventive measures against future compromises.
A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.
An overview of securing Hadoop. Content primarily by Balaji Ganesan, one of the leaders of the Apache Argus project. Presented on Sept 4, 2014 at the Toronto Hadoop User Group by Adam Muise.
Data in Hadoop is getting bigger every day, consumers of the data are growing, organizations are now looking at making their Hadoop cluster compliant to federal regulations and commercial demands. Apache Ranger simplifies the management of security policies across all components in Hadoop. Ranger provides granular access controls to data.
The deck describes what security tools are available in Hadoop and their purpose then it moves on to discuss in detail Apache Ranger.
Abstract:
As organizations start to roll out or migrate data driven applications to Apache Hadoop, there are times when they have conflicting needs to leverage their full co-mingled data sets in Hadoop
while providing isolation of sections of such co-mingled data to a specific customer. Serving multiple customers in this manner is a typical multi-tenant usecase and one that can be challenging in Apache Hadoop.
This presentation walks through a number of patterns that can be leveraged for providing isolation of tenants based on the composability of Apache Knox for:
* Authentication/Federation Providers
* KnoxSSO
* Identity Assertion
* Tenant specific topologies
With these patterns, Knox can provide an infrastructure for robust tenant isolation and access control for application UIs and REST APIs for your data landscape, when suitably coupled with a cluster that has carefully considered infrastructure including:
* Kerberos
* Tenant specific user accounts, OUs and Groups within LDAP
* Authorization Policy that is aware of the tenant specific groups,
Summary:
We will walk through some of the patterns that have been used to enable such a multi-tenant environment as well as the specific considerations for topology, access control and user accounts involved with creating such an environment.
Redis for Security Data : SecurityScorecard JVM Redis UsageTimothy Spann
A quick talk about Java, Scala, Spring XD and Spring Data Redis against Redis.
An example of how we are using Redis at SecurityScorecard for security data and some hands-on development in Java and Scala.
Overview of Hadoop security (revise from presentation in Hadoop in Taiwan, 2012). Detail configuration of security infrastructure leveraging kerberos and also extensive integration with LDAP aiming for fast exchange of cluster information. Introduction also Etu Appliance end of the slide.
While traditional on-prem systems have always been a target from internal and external attackers, recent times have seen increased attacks on Hadoop cloud deployments. Hadoop systems are going to be increasingly targeted due to the large volume of data that it stores. Many Hadoop installations on cloud are publicly accessible without any security measures which pose threat to exfiltration of large datasets and possibly crypto-mining on this infrastructure with its huge distributed compute capability.
Apache Knox provides multiple layers of security related to authentication, service-level authorization and web application security controls out of the box for multiple Hadoop components.
Apache Knox provides configuration to prevent common OWASP Top 10 security risks e.g. Cross-site Request Forgery (CSRF), Cross Site Scripting (XSS), MIME Content Type sniffing, Clickjacking, etc. We will also discuss controls like HTTP Strict Transport Security which prevents SSL Downgrade attacks and CORS filter for allowing applications to make cross domain requests only to specifically allowed hosts through XHR. Support to include/exclude Cipher suites and exclude SSL protocols enables compliance with hardening guidelines provided by CIS for application servers.
Knox has several supported authentication mechanisms with Kerberos underneath e.g. LDAP over SSL, AD, PAM based auth for Unix users, integration with Identity Providers like Okta, etc. Also, capabilities like Trusted Proxy, Single Sign-On auth, Hostmap Provider, Identity Assertion Provider, Client Authentication enhances the overall security posture.
We will also cover the typical kill-chain methodology tailored to Hadoop ecosystem which will help formulate the preventive measures against future compromises.
With the advent of Hadoop, there comes the need for professionals skilled in Hadoop Administration making it imperative to be skilled as a Hadoop Admin for better career, salary and job opportunities.
Cloud deployments of Apache Hadoop are becoming more commonplace. Yet Hadoop and it's applications don't integrate that well —something which starts right down at the file IO operations. This talk looks at how to make use of cloud object stores in Hadoop applications, including Hive and Spark. It will go from the foundational "what's an object store?" to the practical "what should I avoid" and the timely "what's new in Hadoop?" — the latter covering the improved S3 support in Hadoop 2.8+. I'll explore the details of benchmarking and improving object store IO in Hive and Spark, showing what developers can do in order to gain performance improvements in their own code —and equally, what they must avoid. Finally, I'll look at ongoing work, especially "S3Guard" and what its fast and consistent file metadata operations promise.
With the advent of Hadoop, there comes the need for professionals skilled in Hadoop Administration making it imperative to be skilled as a Hadoop Admin for better career, salary and job opportunities.
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
Deploying enterprise grade security for Hadoop or six security problems with Apache Hive. In this talk we will discuss the security problems with Hive and then secure Hive with Apache Sentry. Additional topics will include Hadoop security, and Role Based Access Control (RBAC).
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
Learn who is best suited to attend the full Administrator Training, what prior knowledge you should have, and what topics the course covers. Cloudera Senior Curriculum Manager, Ian Wrigley, will discuss the skills you will attain during Admin Training and how they will help you move your Hadoop deployment from strategy to production and prepare for the Cloudera Certified Administrator for Apache Hadoop (CCAH) exam.
The fundamentals and best practices of securing your Hadoop cluster are top of mind today. In this session, we will examine and explain the components, tools, and frameworks used in Hadoop for authentication, authorization, audit, and encryption of data and processes. See how the latest innovations can let you securely connect more data to more users within your organization.
Los proyectos de Big Data han pasado de sus fases de POC, donde la seguridad ha sido, en el mejor de los casos, un aspecto secundario. Por ello las herramientas Big Data y más concretamente las usadas para procesar datos, deben ponerse al día en seguridad.
Las herramientas como Spark, no están pensados para la seguridad. Por eso Abel y Jorge quieren compartir los hacks que han sido necesarios hacer a Spark para poder usar Kerberos para autenticarse contra servicios securizados.
You got your cluster installed and configured. You celebrate, until the party is ruined by your company's Security officer stamping a big "Deny" on your Hadoop cluster. And oops!! You cannot place any data onto the cluster until you can demonstrate it is secure In this session you would learn the tips and tricks to fully secure your cluster for data at rest, data in motion and all the apps including Spark. Your Security officer can then join your Hadoop revelry (unless you don't authorize him to, with your newly acquired admin rights)
Kerberos is the system which underpins the vast majority of strong authentication across the Apache HBase/Hadoop application stack. Kerberos errors have brought many to their knees and it is often referred to as “black magic” or “the dark arts”; a long-standing joke that there are so few who understand how it works. This talk will cover the types of problems that Kerberos solves and doesn’t solve for HBase, decrypt some jargon on related libraries and technology that enable Kerberos authentication in HBase and Hadoop, and distill some basic takeaways designed to ease users in developing an application that can securely communicate with a “kerberized” HBase installation.
Kerberos is the system which underpins the vast majority of strong authentication across the Apache Hadoop application stack. Kerberos errors have brought many to their knees and it is often referred to as “black magic” or “the dark arts”; a long-standing joke that there are so few who understand how it works. This talk will cover the types of problems that Kerberos solves and doesn’t solve for Accumulo, decrypt some jargon on related libraries and technology that enable Kerberos authentication in Accumulo and Hadoop, and distill some basic takeaways designed to ease users in developing an application that can securely communicate with a “kerberized” Accumulo installation.
— Speaker —
Josh Elser
Staff Software Engineer, Hortonworks
Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software, taking an active role in the Apache Software Foundation. Josh enjoys working on distributed system problems, and regularly contributes to many Apache projects, most frequently to Accumulo, Calcite, HBase, Phoenix, and Slider. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
— More Information —
For more information see http://www.accumulosummit.com/
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
For the first time, Hortonworks Data Platform ships with Apache Storm for processing stream data in Hadoop.
In this presentation, Himanshu Bari, Hortonworks senior product manager, and Taylor Goetz, Hortonworks engineer and committer to Apache Storm, cover Storm and stream processing in HDP 2.1:
+ Key requirements of a streaming solution and common use cases
+ An overview of Apache Storm
+ Q & A
Since April 2016, Spark-as-a-service has been available to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site. Researchers work in an entirely UI-driven environment on a platform built with only open-source software.
Spark applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin. Spark applications are run within a project on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant Spark streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Graphana and Graphite for monitoring Spark streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Sept 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
To conclude, we will also give an overview on our course ID2223 on Large Scale Learning and Deep Learning, in which 60 students designed and ran SparkML applications on the platform.
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
Since April 2016, Spark-as-a-service has been available to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site. Researchers work in an entirely UI-driven environment on a platform built with only open-source software.
Spark applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin. Spark applications are run within a project on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant Spark streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Graphana and Graphite for monitoring Spark streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Sept 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
To conclude, we will also give an overview on our course ID2223 on Large Scale Learning and Deep Learning, in which 60 students designed and ran SparkML applications on the platform.
This webinar series covers Apache Kafka and Apache Storm for streaming data processing. Also, it discusses new streaming innovations for Kafka and Storm included in HDP 2.2
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
In February 2013, the open source community launched the Stinger Initiative to improve speed, scale and SQL semantics in Apache Hive. After thirteen months of constant, concerted collaboration (and more than 390,000 new lines of Java code) Stinger is complete with Hive 0.13.
In this presentation, Carter Shanklin, Hortonworks director of product management, and Owen O'Malley, Hortonworks co-founder and committer to Apache Hive, discuss how Hive enables interactive query using familiar SQL semantics.
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
26Oct2023_ Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup.pdf
## Details
**Important**
Please complete your registration in this short form.
For on-site we have limited room, so please confirm if you are attending in-person in Manhattan, NYC.
--------------------------------------------------------------------------------------------
We're at StarTree, excited to join forces with our friends at Cloudera, for a meetup that is all about The Latest in Real-Time Analytics: Generative AI and LLM, featuring Apache Pinot and Apache NiFi.
Join us for an insightful discussion about cutting-edge analytics, meet the community in person, and catch up over drinks and snacks.
What's the plan ?
05:30-06:00 Pizza and Networking
06:00-06:35 Adding Generative AI to Real-Time Streaming Pipelines | Tim Spann, Principal Developer Advocate, Cloudera
06:35-07:10 Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil, VP of Solutions Engineering and Enablement, StarTree
07:10-07:20 QNA
07:20- 07:30 More Snacks and Networking ;)
**Important**
Seats are limited
Please complete your registration in this short form.
Adding Generative AI to Real-Time Streaming Pipelines | Timothy Spann
In this talk, Tim will discuss the basics of real-time streaming, walk through the tools used including Apache NiFi, Apache Kafka and Apache Flink and show how to build a real-time streaming pipeline that sends prompts to LLMs hosted by the likes of Hugging Face, IBM and Cloudera. He will also discuss where real-time data stores like Apache Pinot come into play.
He will show a detailed demonstration of a few use cases involving different sources of data including Kafka, Medium Articles and interactive Question and Response in Slack. He will then show you how you can build your own and where areas of growth exist.
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
https://github.com/tspannhw/SpeakerProfile
Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil
The other Tim, Tim Veil, will dive into the history and architect
A review of the state of cloud store integration with the Hadoop stack in 2018; including S3Guard, the new S3A committers and S3 Select.
Presented at Dataworks Summit Berlin 2018, where the demos were live.
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Evention
Since June 2016, Kafak, Spark and Flink-as-a-service have been available to researchers and companies in
Sweden from the Swedish ICT SICS Data Center at www.hops.site using the HopsWorks platform (www.hops.io). Flink and Spark applications are run within a project on a YARN cluster with the novel property that applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running streaming applications, how we use Graphana and Graphite for monitoring streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Oct 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications. Hopsworks is entirely UI-driven with an Apache v2 open source license.
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
https://www.aicamp.ai/event/eventdetails/W2024022214
apache nifi
llm
generative ai
gen ai
ml
dl
machine learning
apache kafka
apache flink
postgresql
python
AI Meetup (NYC): GenAI, LLMs, ML and Data
Feb 22, 05:30 PM EST
Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers
Agenda:
* 5:30pm~6:00pm: Checkin, Food/drink and networking
* 6:00pm~6:10pm: Welcome/community update
* 6:10pm~8:30pm: Tech talks
* 8:30pm: Q&A, Open discussion
Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs
Speaker: Zain Hasan (Weaviate LinkedIn)
Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.
Tech Talk: Codeless Generative AI Pipelines
Speaker: Timothy Spann (Cloudera LinkedIn)
Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task.
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
Sponsors:
We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide.
Venue:
Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036
Room Name: Central Park West 6501
Community on Slack/Discord
- Event chat: chat and connect with speakers and attendees
- Sharing blogs, events, job openings, projects collaborations
Join Slack (search and join the #newyork channel) | Join Discord
August 2018 version of my "What does rename() do", includes the full details on what the Hadoop MapReduce and Spark commit protocols are, so the audience will really understand why rename really, really matters
Put is the new rename: San Jose Summit EditionSteve Loughran
This is the June 2018 variant of the "Put is the new Rename Talk", looking at Hadoop stack integration with object stores, including S3, Azure storage and GCS.
The lessons from implementing a twitter bot designed to live on a raspberry pi and heckle politicians —and deployed into production in the 2017 UK General Election
Berlin Buzzwords 2017 talk: A look at what our storage models, metaphors and APIs are, showing how we need to rethink the Posix APIs to work with object stores, while looking at different alternatives for local NVM.
This is the unabridged talk; the BBuzz talk was 20 minutes including demo and questions, so had ~half as many slides
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
A talk looking at the intricate details of working with an object store from Hadoop, Hive, Spark, etc, why the "filesystem" metaphor falls down, and what work myself and others have been up to to try and fix things
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
The March 2017 version of the "Apache Spark and Object Stores", includes coverage of the Staging Committer. If you'd been at the talk you'd have seen the projector fail just before the demo. It worked earlier! Honest!
My talk from Berlin Buzzwords 2016, looking at whether it is actually possible to lock down a household to the extent you could call it "secure". I also try to highlight that we need to consider "privacy" alongside security.
An iteration of the Hoya slides put up as part of a review of the code with others writing YARN services; looks at what Hoya offers -what we need from Apps to be able to deploy them this way, and what we need from YARN
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Enough people like Dunkin Donut's decaf coffee that you can buy it for home use —and supermarkets will stock it next to the MacDonalds coffee.
This is your get out clause. Turn off encryption. Users are who they claim to be; the environment variable HADOOP_USER can change it on a whim.
..which is why production clusters are all locked down with kerberos.
Callout: this doesn't cover authorization/access control (exception: Hadoop IPC acls), wire encryption, HTTPS or data encryption.
So you can't ignore Kerberos. You only get a choice about when to encounter it
-early on in your coding and testing
-during final integration tests
-in late night support calls.
The KDC is managed by the enterprise security team. They are either paranoid about security, or your organisation is 0wned by everyone from Anonymous to North Korea. They don't trust you, they don't trust Hadoop, and make the rest of the network ops people seem welcoming.
You will need to work with these people.
AuthenticatedURL
DelegationTokenAuthenticatedURL
org.apache.hadoop.hdfs.web.URLConnectionFactory
org/apache/spark/deploy/history/yarn/rest in SPARK-1537
There is a mini KDC, "MiniKDC" in the Hadoop codebase. I've used this in the YARN-913 registry work; its good for verifying that you got through the permissions logic, and for learning various acronyms. And at the end of the run you get tests that Jenkins can run every build.
But I've embraced testing against kerberized VMs, where you do the work of creating keytabs, filling in the configuration files, requiring SPENGO authed web browsers, having your command line account kinit in regularly, services having tokens expire, etc. etc. Why? Because its what the real world is like. L
Error messages with UGI are usually a sign of trouble
Photo: https://www.flickr.com/photos/doctorserone/4635167170/
Andrés Álvarez Iglesias