The last five to ten years has seen massive advancements in open source Internet-wide mass-scan tooling, on-demand cloud computing, and high speed Internet connectivity. This has lead to a massive influx of different groups mass-scanning all four billion IP address in the IPv4 space on a constant basis. Information security researchers, cyber security companies, search engines, and criminals scan the Internet for various different benign and nefarious reasons (such as the WannaCry ransomware and multiple MongoDB, ElasticSearch, and Memcached ransomware variants). It is increasingly difficult to differentiate between scan/attack traffic targeting your organization specifically and opportunistic mass-scan background radiation packets.
Grey Noise is a system that records and analyzes all the collective omnidirectional background noise of the Internet, performs enrichments and analytics, and makes the data available to researchers for free. Traffic is collected by a large network of geographically and logically diverse “listener” servers distributed around different data centers belonging to different cloud providers and ISPs around the world.
In this talk I will candidly discuss motivations for developing the system, a technical deep dive on the architecture, data pipeline, and analytics, observations and analysis of the traffic collected by the system, business impacts for network operators, pitfalls and lessons learned, and the vision for the system moving forward.
Automating Kubernetes Environments with AnsibleTimothy Appnel
Ansible fits naturally into any Kubernetes environment. Both are very active and widely used open source projects with vibrant communities that help make hard things easier. Here, we explore ways how...
Cloud foundry: The Platform for Forging Cloud Native ApplicationsChip Childers
It wasn’t too long ago that artisans, bathed in the glow of molten metal, forged parts that would go on to make up bigger, more powerful machines. Today, we call those artisans developers. Instead of metal, they use bits and bytes in the cloud to forge a modern application architecture that supports public, private and hybrid application deployment. One that enables users and developers to move their applications wherever they need to go. And it’s built on a growing, vibrant ecosystem.
Nowhere is this epic shift in how things are made more visible than the meteoric adoption of Cloud Foundry. In this talk, Chip Childers, VP of Technology for Cloud Foundry Foundation, will give attendees an inside look at the industry movements and the technological requirements that are driving Cloud Foundry's rapid adoption. Most importantly, he will walk through how organizations are responding to the challenge of continuous innovation, what's driving modern application architectures, and how the Cloud Foundry platform uses specific constraints in order to fulfill it's promise to application owners.
Validation and Verification using Rational DOORS for AerospaceHellasserve
This Presentation shows the implementation of Verification and Validation in aerospace using IBM Rational DOORS to provide compliance in requirements like DO-178C, ARP4754 etc.
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
To manage the ever-increasing volume and velocity of data within your company, you have successfully made the transition from single machines and one-off solutions to large distributed stream infrastructures in your data center, powered by Apache Kafka. But what if one data center is not enough? I will describe building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence, and provide an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication, and mirroring as well as disaster scenarios and failure handling.
Automating Kubernetes Environments with AnsibleTimothy Appnel
Ansible fits naturally into any Kubernetes environment. Both are very active and widely used open source projects with vibrant communities that help make hard things easier. Here, we explore ways how...
Cloud foundry: The Platform for Forging Cloud Native ApplicationsChip Childers
It wasn’t too long ago that artisans, bathed in the glow of molten metal, forged parts that would go on to make up bigger, more powerful machines. Today, we call those artisans developers. Instead of metal, they use bits and bytes in the cloud to forge a modern application architecture that supports public, private and hybrid application deployment. One that enables users and developers to move their applications wherever they need to go. And it’s built on a growing, vibrant ecosystem.
Nowhere is this epic shift in how things are made more visible than the meteoric adoption of Cloud Foundry. In this talk, Chip Childers, VP of Technology for Cloud Foundry Foundation, will give attendees an inside look at the industry movements and the technological requirements that are driving Cloud Foundry's rapid adoption. Most importantly, he will walk through how organizations are responding to the challenge of continuous innovation, what's driving modern application architectures, and how the Cloud Foundry platform uses specific constraints in order to fulfill it's promise to application owners.
Validation and Verification using Rational DOORS for AerospaceHellasserve
This Presentation shows the implementation of Verification and Validation in aerospace using IBM Rational DOORS to provide compliance in requirements like DO-178C, ARP4754 etc.
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
To manage the ever-increasing volume and velocity of data within your company, you have successfully made the transition from single machines and one-off solutions to large distributed stream infrastructures in your data center, powered by Apache Kafka. But what if one data center is not enough? I will describe building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence, and provide an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication, and mirroring as well as disaster scenarios and failure handling.
이성민 / Netflix - [특별 발표]<시니어가 들려주는 "내가 알고 있는 걸 당신도 알게 된다면">
"모든 엔지니어는 실패를 통해 성장하고 저 또한 그랬습니다.
제가 주니어 때 알았다면 좋았을 이야기들, 오늘 이 자리에서 나누어보고자 합니다."
영상: https://youtu.be/MXl_t1vjkyU
주최: https://www.facebook.com/groups/InfraEngineer
Kafka, Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform (Kafka Core + Kafka Connect + Kafka Streams) for building streaming data pipelines and streaming data applications.
This talk, that I gave at the Chicago Java Users Group (CJUG) on June 8th 2017, is mainly focusing on Kafka Streams, a lightweight open source Java library for building stream processing applications on top of Kafka using Kafka topics as input/output.
You will learn more about the following:
1. Apache Kafka: a Streaming Data Platform
2. Overview of Kafka Streams: Before Kafka Streams? What is Kafka Streams? Why Kafka Streams? What are Kafka Streams key concepts? Kafka Streams APIs and code examples?
3. Writing, deploying and running your first Kafka Streams application
4. Code and Demo of an end-to-end Kafka-based Streaming Data Application
5. Where to go from here?
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022HostedbyConfluent
"Idempotence is a mathematical requirement of particular operations where the operation can be applied multiple times without changing the result beyond the initial application.
The main driver behind the idempotency requirement is often to handle duplicated messages. As developers and architects, we need to pay close attention to how we deal with our production data during new deployments to ensure we are not losing any data, duplicating messages, or introducing malformed data into our system. Furthermore, we need to figure out how to automate the process and add testing guarantees to prevent any potential human error.
In this session, you will learn about the idempotent Kafka Producer & Consumer architecture and how to automate the CI/CD process with open-source tools."
By Tom Wilkie, delivered at London Microservices User Group on 2/12/15
The rise of microservice-based applications has had many knock-on effects, not least on the complexity of monitoring your application. Order-of-magnitude increase in the number of moving parts and rate of change of the application require us to reassess traditional monitoring techniques.
In this talk we will discuss some different approaches to monitoring, visualising and tracing containerised, microservices-based applications. We’ll present different techniques to some of the emergent problems, and try not to rant too much.
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentHostedbyConfluent
Consuming messages in parallel is what Apache Kafka® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are a number of situations where Kafka’s partition-level parallelism gets in the way of optimal design.
This session will go over some of these types of situations that can benefit from parallel message processing within a single application instance (aka slow consumers or competing consumers), and then introduce the new Parallel Consumer labs project from Confluent, which can improve functionality and massively improve performance in such situations.
It will cover the
- Different ordering modes of the client
- Relative performance improvements
- Usage with other components like Kafka Streams
- An introduction to the internal architecture of the project
- How it can achieve all this in a reassignment friendly manner
Integrating Splunk into your Spring ApplicationsDamien Dallimore
How much visibility do you really have into your Spring applications? How effectively are you capturing,harnessing and correlating the logs, metrics, & messages from your Spring applications that can be used to deliver this visibility ? What tools and techniques are you providing your Spring developers with to better create and utilize this mass of machine data ? In this session I'll answer these questions and show how Splunk can be used to not only provide historical and realtime visibility into your Spring applications , but also as a platform that developers can use to become more "devops effective" & easily create custom big data integrations and standalone solutions.I'll discuss and demonstrate many of Splunk's Java apps,frameworks and SDK and also cover the Spring Integration Adaptors for Splunk.
Spring Cloud Function: Where We Were, Where We Are, and Where We’re GoingVMware Tanzu
SpringOne 2021
Session Title: Spring Cloud Function: Where We Were, Where We Are, and Where We’re Going
Speakers: Marc DiPasquale, Developer Advocate at Solace; Mark Sailes, Specialist Solutions Architect, Serverless at Amazon Web Services; Oleg Zhurakousky, Developer at VMware
Intelligently collecting data at the edge—intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr Member of Technical Staff, Hortonworks
This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.
고승범(peter.ko) / kakao corp.(인프라2팀)
---
카카오에서는 빅데이터 분석, 처리부터 모든 개발 플랫폼을 이어주는 솔루션으로 급부상한 카프카(kafka)를 전사 공용 서비스로 운영하고 있습니다. 전사 공용 카프카를 직접 운영하면서 경험한 트러블슈팅과 운영 노하우 등을 공유하고자 합니다. 특히 카프카를 처음 접하시는 분들이나 이미 사용 중이신 분들이 많이 궁금해하는 프로듀서와 컨슈머 사용 시의 주의점 등에 대해서도 설명합니다.
Spring integration을 통해_살펴본_메시징_세계Wangeun Lee
[스프링캠프 2015] Spring Integration을 통해 살펴본 메시징 세계 발표자료 입니다.
예제 소스 저장소는 프리젠테이션 안에 링크 걸어놨습니다.
감사합니다.
-------------------------------------------------------------------
우리는 늘 누군가와 소통(Communication)을 합니다. 소통을 통하여 누군가에게 일을 시키기도 하고 내가 일을 받기도 합니다. 애플리케이션도 마찬가지로 이기종간의 애플리케이션끼리 데이터로 소통을 하며 할 일을 서로 분산 처리할 상황이 발생하기도 합니다.
이런 분산 처리 이전에는 소통이 전제되어야 합니다. 애플리케이션 간 소통에 대한 고민은 선구자들에 의해 Enterprise Integration Patterns로 탄생되었으며 Spring에서도 그 패턴화의 추상화 일원으로 Spring Integration을 탄생시켰습니다.
이 강연에서는 Spring Integration을 통해 애플리케이션 간에 어떻게 쉽고 편하게(?) 소통을 할 수 있게 되었는지 살펴보며 예제와 사례를 통해 Spring Integration 입문에 도움을 주고자 합니다.
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
At Booking.com, we have a constant flow of events coming from various applications and internal subsystems. This critical data needs to be stored for real-time, medium and long term analysis. Events are schema-less, making it difficult to use standard analysis tools.This presentation will explain how we built a storage and analysis solution based on Riak. The talk will cover: data aggregation and serialization, Riak configuration, solutions for lowering the network usage, and finally, how Riak's advanced features are used to perform real-time data crunching on the cluster nodes.
이성민 / Netflix - [특별 발표]<시니어가 들려주는 "내가 알고 있는 걸 당신도 알게 된다면">
"모든 엔지니어는 실패를 통해 성장하고 저 또한 그랬습니다.
제가 주니어 때 알았다면 좋았을 이야기들, 오늘 이 자리에서 나누어보고자 합니다."
영상: https://youtu.be/MXl_t1vjkyU
주최: https://www.facebook.com/groups/InfraEngineer
Kafka, Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform (Kafka Core + Kafka Connect + Kafka Streams) for building streaming data pipelines and streaming data applications.
This talk, that I gave at the Chicago Java Users Group (CJUG) on June 8th 2017, is mainly focusing on Kafka Streams, a lightweight open source Java library for building stream processing applications on top of Kafka using Kafka topics as input/output.
You will learn more about the following:
1. Apache Kafka: a Streaming Data Platform
2. Overview of Kafka Streams: Before Kafka Streams? What is Kafka Streams? Why Kafka Streams? What are Kafka Streams key concepts? Kafka Streams APIs and code examples?
3. Writing, deploying and running your first Kafka Streams application
4. Code and Demo of an end-to-end Kafka-based Streaming Data Application
5. Where to go from here?
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022HostedbyConfluent
"Idempotence is a mathematical requirement of particular operations where the operation can be applied multiple times without changing the result beyond the initial application.
The main driver behind the idempotency requirement is often to handle duplicated messages. As developers and architects, we need to pay close attention to how we deal with our production data during new deployments to ensure we are not losing any data, duplicating messages, or introducing malformed data into our system. Furthermore, we need to figure out how to automate the process and add testing guarantees to prevent any potential human error.
In this session, you will learn about the idempotent Kafka Producer & Consumer architecture and how to automate the CI/CD process with open-source tools."
By Tom Wilkie, delivered at London Microservices User Group on 2/12/15
The rise of microservice-based applications has had many knock-on effects, not least on the complexity of monitoring your application. Order-of-magnitude increase in the number of moving parts and rate of change of the application require us to reassess traditional monitoring techniques.
In this talk we will discuss some different approaches to monitoring, visualising and tracing containerised, microservices-based applications. We’ll present different techniques to some of the emergent problems, and try not to rant too much.
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentHostedbyConfluent
Consuming messages in parallel is what Apache Kafka® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are a number of situations where Kafka’s partition-level parallelism gets in the way of optimal design.
This session will go over some of these types of situations that can benefit from parallel message processing within a single application instance (aka slow consumers or competing consumers), and then introduce the new Parallel Consumer labs project from Confluent, which can improve functionality and massively improve performance in such situations.
It will cover the
- Different ordering modes of the client
- Relative performance improvements
- Usage with other components like Kafka Streams
- An introduction to the internal architecture of the project
- How it can achieve all this in a reassignment friendly manner
Integrating Splunk into your Spring ApplicationsDamien Dallimore
How much visibility do you really have into your Spring applications? How effectively are you capturing,harnessing and correlating the logs, metrics, & messages from your Spring applications that can be used to deliver this visibility ? What tools and techniques are you providing your Spring developers with to better create and utilize this mass of machine data ? In this session I'll answer these questions and show how Splunk can be used to not only provide historical and realtime visibility into your Spring applications , but also as a platform that developers can use to become more "devops effective" & easily create custom big data integrations and standalone solutions.I'll discuss and demonstrate many of Splunk's Java apps,frameworks and SDK and also cover the Spring Integration Adaptors for Splunk.
Spring Cloud Function: Where We Were, Where We Are, and Where We’re GoingVMware Tanzu
SpringOne 2021
Session Title: Spring Cloud Function: Where We Were, Where We Are, and Where We’re Going
Speakers: Marc DiPasquale, Developer Advocate at Solace; Mark Sailes, Specialist Solutions Architect, Serverless at Amazon Web Services; Oleg Zhurakousky, Developer at VMware
Intelligently collecting data at the edge—intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr Member of Technical Staff, Hortonworks
This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.
고승범(peter.ko) / kakao corp.(인프라2팀)
---
카카오에서는 빅데이터 분석, 처리부터 모든 개발 플랫폼을 이어주는 솔루션으로 급부상한 카프카(kafka)를 전사 공용 서비스로 운영하고 있습니다. 전사 공용 카프카를 직접 운영하면서 경험한 트러블슈팅과 운영 노하우 등을 공유하고자 합니다. 특히 카프카를 처음 접하시는 분들이나 이미 사용 중이신 분들이 많이 궁금해하는 프로듀서와 컨슈머 사용 시의 주의점 등에 대해서도 설명합니다.
Spring integration을 통해_살펴본_메시징_세계Wangeun Lee
[스프링캠프 2015] Spring Integration을 통해 살펴본 메시징 세계 발표자료 입니다.
예제 소스 저장소는 프리젠테이션 안에 링크 걸어놨습니다.
감사합니다.
-------------------------------------------------------------------
우리는 늘 누군가와 소통(Communication)을 합니다. 소통을 통하여 누군가에게 일을 시키기도 하고 내가 일을 받기도 합니다. 애플리케이션도 마찬가지로 이기종간의 애플리케이션끼리 데이터로 소통을 하며 할 일을 서로 분산 처리할 상황이 발생하기도 합니다.
이런 분산 처리 이전에는 소통이 전제되어야 합니다. 애플리케이션 간 소통에 대한 고민은 선구자들에 의해 Enterprise Integration Patterns로 탄생되었으며 Spring에서도 그 패턴화의 추상화 일원으로 Spring Integration을 탄생시켰습니다.
이 강연에서는 Spring Integration을 통해 애플리케이션 간에 어떻게 쉽고 편하게(?) 소통을 할 수 있게 되었는지 살펴보며 예제와 사례를 통해 Spring Integration 입문에 도움을 주고자 합니다.
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
At Booking.com, we have a constant flow of events coming from various applications and internal subsystems. This critical data needs to be stored for real-time, medium and long term analysis. Events are schema-less, making it difficult to use standard analysis tools.This presentation will explain how we built a storage and analysis solution based on Riak. The talk will cover: data aggregation and serialization, Riak configuration, solutions for lowering the network usage, and finally, how Riak's advanced features are used to perform real-time data crunching on the cluster nodes.
Distributed Sensor Data Contextualization for Threat Intelligence AnalysisJason Trost
As organizations operationalize diverse network sensors of various types, from passive sensors to DNS sinkholes to honeypots, there are many opportunities to combine this data for increased contextual awareness for network defense and threat intelligence analysis. In this presentation, we discuss our experiences by analyzing data collected from distributed honeypot sensors, p0f, snort/suricata, and botnet sinkholes as well as enrichments from PDNS and malware sandboxing. We talk through how we can answer the following questions in an automated fashion: What is the profile of the attacking system? Is the host scanning/attacking my network an infected workstation, an ephemeral scanning/exploitation box, or a compromised web server? If it is a compromised server, what are some possible vulnerabilities exploited by the attacker? What vulnerabilities (CVEs) has this attacker been seen exploiting in the wild and what tools do they drop? Is this attack part of a distributed campaign or is it limited to my network?
Attackers don’t just search for technology vulnerabilities, they take the easiest path and find the human vulnerabilities. Drive by web attacks, targeted spear phishing, and more are commonplace today with the goal of delivering custom malware. In a world where delivering custom advanced malware that handily evades signature and blacklisting approaches, and does not depend on application software vulnerabilities, how do we understand when are environments are compromised? What are the telltale signs that compromise activity has started, and how can we move to arrest a compromise in progress before the attacker laterally moves and reinforces their position? The penetration testing community knows these signs and artifacts of advanced malware presence, and it is up to us to help educate defenders on what to look for.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Network Forensics and Practical Packet AnalysisPriyanka Aash
Why Packet Analysis?
3 Phases - Analysis, Conversion & Collection
How do we do it ?
Statistics - Protocol Hierarchy
Statistics - End Points & Conversations
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
In this talk, I will talk about why log files are horrible, logging log lines, and more structured performance metrics from large scale production applications as well as building reliable, scaleable and flexible large scale software systems in multiple languages.
Why (almost) all log formats are horrible will be explained, and why JSON is a good solution for logging will be discussed, along with a number of message queuing, middleware and network transport technologies, including STOMP, AMQP and ZeroMQ.
The Message::Passing framework will be introduced, along with the logstash.net project which the perl code is interoperable with. These are pluggable frameworks in ruby/java/jruby and perl with pre-written sets of inputs, filters and outputs for many many different systems, message formats and transports.
They were initially designed to be aggregators and filters of data for logging. However they are flexible enough to be used as part of your messaging middleware, or even as a replacement for centralised message queuing systems.
You can have your cake and eat it too - an architecture which is flexible, extensible, scaleable and distributed. Build discrete, loosely coupled components which just pass messages to each other easily.
Integrate and interoperate with your existing code and code bases easily, consume from or publish to any existing message queue, logging or performance metrics system you have installed.
Simple examples using common input and output classes will be demonstrated using the framework, as will easily adding your own custom filters. A number of common messaging middleware patterns will be shown to be trivial to implement.
Some higher level use-cases will also be explored, demonstrating log indexing in ElasticSearch and how to build a responsive platform API using webhooks.
Interoperability is also an important goal for messaging middleware. The logstash.net project will be highlighted and we'll discuss crossing the single language barrier, allowing us to have full integration between java, ruby and perl components, and to easily write bindings into libraries we want to reuse in any of those languages.
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
Bullet is an open sourced, lightweight, pluggable querying system for streaming data without a persistence layer implemented on top of Storm. It allows you to filter, project, and aggregate on data in transit. It includes a UI and WS. Instead of running queries on a finite set of data that arrived and was persisted or running a static query defined at the startup of the stream, our queries can be executed against an arbitrary set of data arriving after the query is submitted. In other words, it is a look-forward system.
Bullet is a multi-tenant system that scales independently of the data consumed and the number of simultaneous queries. Bullet is pluggable into any streaming data source. It can be configured to read from systems such as Storm, Kafka, Spark, Flume, etc. Bullet leverages Sketches to perform its aggregate operations such as distinct, count distinct, sum, count, min, max, and average.
An instance of Bullet is currently running at Yahoo against its user engagement data pipeline. We’ll highlight how it is powering internal use-cases such as web page and native app instrumentation validation. Finally, we’ll show a demo of Bullet and go over query performance numbers.
Staying Ahead of Internet Background Exploitation - Microsoft BlueHat Israel ...Andrew Morris
It’s not just you. The frequency of severe vulnerabilities in internet-facing enterprise software being massively exploited at scale has increased drastically. The amount of time between disclosure and exploitation of these vulnerabilities has been reduced to near-zero, leaving defenders with less time to react and respond. While combating internet-wide opportunistic exploitation is a sprawling and complex problem, there is both an art and a science to staying ahead of large exploitation events such as Log4J.
In this talk we will share insights and challenges from operating a huge, shifting, adaptive, distributed sensor network listening to internet background noise and opportunistic exploitation traffic over the past four years. We will give a blunt state of the universe on mass exploitation. We will share patterns and unexplainable phenomena we’ve experienced across billions of internet scans. And we will make recommendations to defenders for preparing for the next time the cyber hits the fan.
Using GreyNoise to Quantify Response Time of Cloud Provider Abuse TeamsAndrew Morris
Cloud hosting providers, such as Amazon AWS, Google Cloud, DigitalOcean, Microsoft Azure, and many others, have to respond to a regular barrage of abuse complaint reports from all around the world when their customers virtual private servers are used for malicious activity. This activity can happen knowingly by the "renter" of the system or on behalf of an attacker if the server becomes infected. Although by no means the end all, one way of measuring the trust posture of a cloud hosting provider is by analyzing the amount of time between shared hosts beginning to attack other hosts on the Internet and the activity ceasing, generally by way of forced-decommissioning, quarantining, or remediation of the root-cause, such as a malware infection. In this talk, we discuss using the data collected by GreyNoise, a large network of passive collector nodes, to measure the time-to-remediation of infected or malicious machines. We will discuss methodology, results, and actionable takeaways for conference attendees who use shared cloud hosting in their businesses.
Identifying and Correlating Internet-wide Scan Traffic to Newsworthy Security...Andrew Morris
In this presentation, we will discuss using GreyNoise, a geographically and logically distributed system of passive Internet scan traffic collector nodes, to identify statistical anomalies in global opportunistic Internet scan traffic and correlate these anomalies with publicly disclosed vulnerabilities, large-scale DDoS attacks, and other newsworthy events. We will discuss establishing (and identifying any deviations away from) a “standard” baseline of Internet scan traffic. We will discuss successes and failures of different methods employed over the past six months. We will explore open questions and future work on automated anomaly detection of Internet scan traffic. Finally, we will provide raw data and a challenge as an exercise to the attendees.
ShmooCon 2015: No Budget Threat Intelligence - Tracking Malware Campaigns on ...Andrew Morris
In this talk, I'll be discussing my experience developing intelligence-gathering capabilities to track several different independent groups of threat actors on a very limited budget (read: virtually no budget whatsoever). I'll discuss discovering the groups using open source intelligence gathering and honeypots, monitoring attacks, collecting and analyzing malware artifacts to figure out what their capabilities are, and reverse engineering their malware to develop the capability to track their targets in real time. Finally, I'll chat about defensive strategies and provide recommendations for enterprise security analysts and other security researchers.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
3. About Me
Andrew Morris
Background in offensive
cyber stuff, security
research
Previously:
* Endgame R&D
* Intrepidus (NCC
Group)
* KCG (ManTech)
Twitter: @Andrew___Morris
4. Lots of people scan the Internet.
I built a system that collects all of the
Internet-wide scan traffic.
I analyze the data to find weird stuff.
I make that data available to researchers
for free via an API
6. Background
• Internet-wide mass scanning is easier than ever
• Open source tooling: Masscan, ZMap,
UnicornScan, etc
• Cloud computing
• Instant servers
• large amount of recyclable IP addresses
• High throughput / faster global Internet
connections
7. What is Internet Mass
Scanning?
• “Mass Scanning” is scanning every single
routable IP address on the Internet for
something
• The IPv4 address space is 0.0.0.0 –
255.255.255.255
• Give or take a few blocks
• That’s 4.2 billion IP addresses
• Bandwidth-wise, roughly same as uploading a 240
GB file
8. What does this mean?
• Lots and lots of people scanning the Internet,
for lots of different things
• From millions of different IP addresses
• Benign: Shodan, Censys, Sonar, ShadowServer
• Malicious: SSH/Telnet worms (Mirai), IOT worms,
CONFICKER, etc
• Internet-wide scanning is busier than ever
9. This creates a problem
When you see an IP scanning your network,
are they scanning you specifically or the
entire Internet?
When you see an IP attacking your network,
are they attacking you specifically or the
entire Internet?
10. Solution
• Collect all the omnidirectional Internet-wide
IPv4 scan/attack traffic
• Subtract those IPs/activity from your SIEM
• All the remaining activity is targeting you
11. But how?
• Stand up a large amount of servers in diverse
data centers with no business value
• No business value means that ANY traffic that hits
it is, by definition, opportunistic
• Instrument these servers with extremely
aggressive logging and small microservices
• Stream the logs of the scan/attack traffic to a
central place
• Analyze the data and convert into a consumable
format
12. Barriers
• It is strategically cheaper to ask a question
of the Internet than it is to answer a given
question
• How many computers are running X version of software
is easy
• How many computers are scanning for X version of
software is hard
13. Byproducts
• Observe changes in Internet-scanning over time
• Opt-out of omnidirectional scanning altogether
• Collect information on malware campaigns and
botnets
14. History
• Like three honeypots (2014)
• Animus v1 (2015)
• Bash and glue (SHMOOCON 2015 “No Budget Threat
Intelligence”)
• Related work at a previous company (2015-2016)
• EPIPHANY (2016)
• THE DATA THAT HONEYPOTS COLLECT IS SHITTY THREAT
INTELLIGENCE
• IT’S LITERALLY THE OPPOSITE OF THREAT INTELLIGENCE
• IT’S ANTI THREAT INTELLIGENCE
• Animus GOES COMMERCIAL (2017)
• Turns out startups are hard
• Grey Noise (2018)
• I’m not going to stop until I die
• ???
• Become a monk
25. Collection: Data producers /
Services
• Ridiculously aggressive iptables rules
• Log all packets
• …on all ports
• …on all protocols
• SSH
• Telnet
• HTTP
• Others
26. Collection: Data producers /
Services
(Lessons)
• MISTAKE: Tune your iptables / p0f / sniffers / whatever to
ignore garbage / outbound traffic
• LESSON: Things will be spoofed (TCP, UDP, and ICMP)
• LESSON: Bang for your buck: Iptables, HTTP, Telnet, SSH, and
P0f
31. Collection: Message Bus
(Lessons)
• MISTAKE: Google PubSub
• LESSON: Maintain state
• LESSON: Meta message envelop
• Time
• Provider
• Region
• Node UUID
• POSSIBLE: ZeroMQ, Kafka
•Streamd
32. Collection: Log Forwarder
•I wrote my own
•Python + Pygtail / iNotify / Watchdog
•Can also use something that’s already been
written
•Logstash
•Elasticsearch Filebeat
•Rsyslog
35. Analysis: Cache / Database
• PostgreSQL
• N days of data, rotates
• Fast-ish
• Robust
Dumpster
Long term storage
You’re going to fuck something up
Retro load is your friend
36. Analysis: Cache / Database
• MISTAKE: Postgres is awesome but too slow for data this big
• MISTAKE: Google BigQuery is the shit but it gets expensive if you're
doing batch queries on a very short timeline
• LESSON: Postgres + Cassandra is the truth
39. Analysis: Enrichments
• We need:
• ASN
• rDNS
• Organization
• Country
• City
• Maxmind is expensive
• Neustar is expensive
• Ipinfo is CHEAP
• Harvesting it yourself is also CHEAP but requires a lot of effort
40. Analysis: Enrichments
(Lessons)
• MISTAKE: Collecting the data yourself is hard and inconsistent and
involves a lot of work
• LESSON: ARIN has an unauthenticated non rate-limited public API for
IP ownership
• LESSON: Enrichd
• LESSON: Cache rules everything around me
44. Analysis: Analyticsd
• Service to analyze some time window of data
• E.g. past 4 days of data
• Catalogue:
• Actors
• Shodan
• Censys
• Sonar
• Activity
• Scanning for SSH
• Scanning for Telnet
• LESSON: YOU PROBABLY DON’T NEED REAL TIME ANALYTICS
• Batch analytics with small time frames
• This is why Postgres will often do the trick
• LESSON: Only pay attention to activity that has happened on more than one of your nodes
• LESSON: You need to know how many nodes are up collecting data at any point in time to
properly do a time-series analysis
47. Consumption: API
• Web API
• Tell me about this IP address
• Tell me about this analytic
• Github
• Search “Grey Noise API”
• Github.com/Grey-Noise-
Intelligence
48. Consumption: Bindings
• Bobby Filar: phyler/greynoise
• Tek: PyGreyNoise
• Bob Rudis: R bindings
• Some mystery Go bindings out there
49. Consumption: FRONT END
• Complete 100% credit to Casey Buto (github.com/cbuto)
• Point and click interface
• Hosted version at viz.greynoise.io
• EXPLORE THE DATA
51. Consumption: FRONT END
• Complete 100% credit to Casey Buto (github.com/cbuto)
• Point and click interface
• Hosted version at viz.greynoise.io
• EXPLORE THE DATA
52. OpSec (Operational Security)
• Hard to fingerprint (mostly custom services)
• Encrypt everything
• No names
• Ops domains
• Dockerize
• Shift infrastructure constantly
• Reduce the oracle surface
• IO is hard to opsec
• Minimum number / node thresholds
• Sleep delays
53. Cost
• AWS: 15 regions
• $4.75 per box
• Total: $71
• Digitalocean: 11 regions
• $5 per box
• Total: $55
• Google: 36 regions
• $4.28 per box
• Total: $154
• Total: $400 per month
Vultr: 15 regions
$5 per box (they advertise $2.50 but they're never
available)
Total: $75
Linode: 9 regions
$5 per box
Total: $45
54. Cost (notes)
• Notes:
• No Ops boxes in here (you need these)
• This is simply not enough to have complete coverage but it'll give you a good
start
• You can save money by buying extra IPs, but it complicates engineering
56. Analysis
• What am I collecting?
• Volume Summary
• Data Summary
• Actor Summary
• Benign
• Malicious
• Unknown???
• Malware Summary
• Hall of Shame (Malware-iest
regions of the Internet)
• WEIRD SHIT
• Misc Lessons
57. What am I collecting?
• Passive
• Iptables – Packets on ports
• P0f – passive OS fingerprint
• JA3 – SSL fingerprint (stick around!)
• Active
• HTTP
• SSH
• Telnet
• Experimental
• RDP
• SIP
• SMTP
• NTP
• TFTP
• DNS
58. Data Summary
• Iptables:
• I don’t have a good way to quantify this yet
• HTTP:
• Lots of ”/”, spoofed user agents, search engines, people looking for
Jboss/Wordpress/Tomcat/PHPMyAdmin
• SSH + Telnet
• Bots. Defaults cred attempts. Nothing new here.
• P0f
• Lots of OS visibility
59. Volume Summary
• With the aforementioned numbers ($400 worth of servers):
• 1M – 2M iptables events per day
• 700k – 1M SSH logins per day
• 1M – 10M telnet logins per day
• 10K – 100K HTTP requests per day
• 100-200 messages per second through your queue
• ~60K IPs per day
• 1GB of raw data, msgpacked + compressed per day
64. Pretenders
• Machines advertising client banners that are
false
• Mismatches between user agent, p0f OS fingerprint,
and JA3
• Is the browser hitting this HTTP server really
running Safari on a Linux kernel 3.1 box? Is it?
• Why? Idk
65. Dangling DNS
• When you spin a bunch of IPs up and down, it’s
not uncommon to inherit an IP address from your
cloud provider that still has a domain pointing
to it.
• CDN.whatever123.acme.com
• This traffic is dirty, you don’t want it
66. “WORM FINDER”
• Sometimes when Grey Noise observes an IP
address scanning for a given TCP port, I’ll
turn around and check to see if that port is
open on the source machine.
• If the answer is yes, this can be a great
indicator of a worm
• Why else would a computer search for behavior
that it also exhibits?
• Average lifespan from start to finish is 4 days
67. Zmap’s hardcoded ID parameter
• Zmap hardcodes all packets it creates with an
ID parameter of “54321”, making it trivial to
fingerprint
• Go to “github.com/zmap/zmap” and search / grep
the repository for “54321”
• Shoutout Oliver Gasser @ Technical University
of Munich
68. Still SO MANY WINDOWS WORMS
• LOADS of people blasting SMB traffic on TCP
port 445
• More and more RDP worms as well, but these
aren’t exploiting vulns, just guessing creds
• WinRM is next, in my opinion
69. People do weird stuff through
proxies
• Airline price scraping data (???)
• Also testing stolen credentials
• And probably credit card numbers
• News sites??? This is a huge rabbit hole…
70. Lots of robo calls probably
come from popped SIP boxes
• People try to make calls to India and Russia
through open VOIP servers
• Like, LOTS of them
• Tens of thousands per day
72. You can neuter/blow up worms by
replaying their own traffic back
to them
• A box is compromised with a Telnet worm
• The worm carries a built in wordlist
• The compromised box throws the same wordlist at
you
• You replay the wordlist back to the compromised
box
• Chances are, depending on the worm, one of
those credentials will work
74. What does the future hold?
• Version 1.1 API coming very soon
• Integrate with everything
• Badass machine learning opportunities
• Explore identifying anti-threat intelligence in
other areas
• Intranet traffic
• DMZ traffic
• Files on a filesystem
76. Conclusion
• The Internet is a noisy place
• Every packet has a story
• It’s possible to collect all of this background
noise
• If you want to explore the data, hit the API.
If the API doesn’t give you what you need,
email me or hit me up on Twitter
77. Acknowledgements
• Phil Maddox (twitter.com/foospidy)
• Bobby Filar (twitter.com/filar)
• Rich Seymour (twitter.com/rseymour)
• Casey Buto (github.com/cbuto)
• Bob Rudis (twitter.com/hrbrmstr)
• Tek (twitter.com/tenacioustek)
• Mickey Perre (twitter.com/MickeyPerre)
• Michel Oosterhof (twitter.com/micheloosterhof)