The document discusses using machine learning models to detect anomalies in DNS traffic from NetFlow data in order to identify attacks targeting DNS servers or using DNS. It provides examples of attack definitions that could be identified from NetFlow data patterns and characteristics. It also presents sample NetFlow conversation data and discusses how the data tells a story and how machine learning models can be built and trained to detect strange or anomalous DNS activities and potential attacks.
This document discusses the author's 12 years of experience defending DNS infrastructure and observing various cyber attacks. It describes how the author migrated to more secure DNS configurations, deployed logging and analytics to gain insights into attacks, and incorporated additional defenses like NIDS and RPZ. It also discusses using NetFlow and Zeek to analyze network traffic patterns and investigate DNS queries in more depth. The overall experience highlighted the value of a multi-layered approach to DNS security through continuous monitoring, analytics, and active defenses.
Mr. Donald Rumsfeld, former Defence Secretary of USA, stated in his book "Known and Unknown: A Memoir" that "There are known knowns, things we know that we know; and there are known unknowns, things that we know we don't know. But there are also unknown unknowns, things we do not know we don't know." And to know that unknowns of the unknown, my journey with the APNIC honeynet project started and I am going to share my experiences here in this talk.
The document summarizes ICT facilities and services at Khon Kaen University. It describes the university's campus network infrastructure, wireless network, IT services like email and e-learning platforms, software pricing, guidelines around software piracy and licensed/open source software, computer crime laws, risks of malicious software, and tips for protecting privacy online.
Who's Watching, by Geoff Huston [APNIC 38 / Technical Keynote]APNIC
The document discusses how an organization called APNIC uses Google Ads to deliver test scripts to users to measure technology deployment. APNIC found that around 1 in 400 users had their unique URLs retrieved from APNIC servers by a different IP address, indicating some kind of digital stalking. The top stalkers were found to be networks in China, with the largest stalker being a backbone network in China retrieving over 300,000 unique URLs from different origins. This suggests the existence of local scriptware, web proxies, or active middleware in China that is harvesting URLs from users.
The document summarizes security issues related to bike sharing services Smide and oBike. It describes the tracking and booking processes of Smide, including vulnerabilities found in the authentication and authorization of the Smide API. It also analyzes the Bluetooth Low Energy communication and encryption used by oBike bikes. The document concludes that while security flaws are inevitable, companies should perform security audits and make systems fixable to address issues when found.
The document discusses blockchain technology and cryptocurrencies. It notes that blockchain is a tool that is not a silver bullet and has pros and cons. Blockchain alone does not solve all problems and requires proper usage to deliver results. The document also discusses trends toward increased efficiency in transaction processing infrastructure from CPUs to ASICs, and how this has led to dramatically higher speeds and lower power consumption over time. Micro-payments and technologies like the Lightning Network are also discussed as enabling new payment paradigms by breaking traditional payment granularity barriers.
This document discusses the author's 12 years of experience defending DNS infrastructure and observing various cyber attacks. It describes how the author migrated to more secure DNS configurations, deployed logging and analytics to gain insights into attacks, and incorporated additional defenses like NIDS and RPZ. It also discusses using NetFlow and Zeek to analyze network traffic patterns and investigate DNS queries in more depth. The overall experience highlighted the value of a multi-layered approach to DNS security through continuous monitoring, analytics, and active defenses.
Mr. Donald Rumsfeld, former Defence Secretary of USA, stated in his book "Known and Unknown: A Memoir" that "There are known knowns, things we know that we know; and there are known unknowns, things that we know we don't know. But there are also unknown unknowns, things we do not know we don't know." And to know that unknowns of the unknown, my journey with the APNIC honeynet project started and I am going to share my experiences here in this talk.
The document summarizes ICT facilities and services at Khon Kaen University. It describes the university's campus network infrastructure, wireless network, IT services like email and e-learning platforms, software pricing, guidelines around software piracy and licensed/open source software, computer crime laws, risks of malicious software, and tips for protecting privacy online.
Who's Watching, by Geoff Huston [APNIC 38 / Technical Keynote]APNIC
The document discusses how an organization called APNIC uses Google Ads to deliver test scripts to users to measure technology deployment. APNIC found that around 1 in 400 users had their unique URLs retrieved from APNIC servers by a different IP address, indicating some kind of digital stalking. The top stalkers were found to be networks in China, with the largest stalker being a backbone network in China retrieving over 300,000 unique URLs from different origins. This suggests the existence of local scriptware, web proxies, or active middleware in China that is harvesting URLs from users.
The document summarizes security issues related to bike sharing services Smide and oBike. It describes the tracking and booking processes of Smide, including vulnerabilities found in the authentication and authorization of the Smide API. It also analyzes the Bluetooth Low Energy communication and encryption used by oBike bikes. The document concludes that while security flaws are inevitable, companies should perform security audits and make systems fixable to address issues when found.
The document discusses blockchain technology and cryptocurrencies. It notes that blockchain is a tool that is not a silver bullet and has pros and cons. Blockchain alone does not solve all problems and requires proper usage to deliver results. The document also discusses trends toward increased efficiency in transaction processing infrastructure from CPUs to ASICs, and how this has led to dramatically higher speeds and lower power consumption over time. Micro-payments and technologies like the Lightning Network are also discussed as enabling new payment paradigms by breaking traditional payment granularity barriers.
The document introduces SolarWinds Network Performance Monitor (NPM) software for network monitoring. It begins with an overview of SolarWinds as a company and network management basics. The presentation then demonstrates NPM's features for monitoring network routes, interfaces, multicast traffic and other performance data. It includes a demo of NPM and discusses licensing options. The document concludes with information on resources like trials, webinars, white papers and ROI tools to learn more about NPM network monitoring capabilities.
The document is a presentation about SolarWinds Network Performance Monitor (NPM) software. It introduces NPM, which provides automated network monitoring and alerts. The presentation covers new features in NPM version 10.5 like route and multicast monitoring. It also includes a customer case study, demo of the software, and discusses licensing costs and ROI. Attendees are encouraged to ask questions.
Enhancing Computer Security via End-to-End Communication Visualization amiable_indian
This document summarizes a novel computer security visualization tool called Network Eye. Network Eye provides an end-to-end visualization of network and host activities to help analyze computer security. It integrates multiple views, including network maps, host views, and packet traces. The tool could help reduce training costs and make administrators more effective. Partnerships between academia and businesses are discussed to jointly develop Network Eye further.
Handy Networking Tools and How to Use ThemSneha Inguva
Linux networking tools can be used to analyze network connectivity and performance. Tools like ifconfig show interface configurations, route displays routing tables, arp shows the ARP cache, dig/nslookup resolve DNS, and traceroute traces the network path. Nmap scans for open ports, ping checks latency, and tcpdump captures traffic. Iperf3 and wrk2 can load test throughput and capacity, while tcpreplay replays captured traffic. These CLI tools provide essential network information and testing capabilities from the command line.
Cloudflare lower network latency = faster website loadsVu Long Tran
Lower network latency = faster website loads
This document discusses how lower network latency leads to faster website loads. It explains that round-trip time (RTT) measures the time for a network request to travel from one point to another and back. Faster protocols like TLS 1.3 can reduce RTT by eliminating handshake rounds. Content delivery networks (CDNs) also reduce latency by caching content closer to users worldwide. New routing technologies like Cloudflare's Argo can improve performance by up to 59% by choosing network paths based on real-time conditions instead of following indirect routes. Overall, the document emphasizes that lower latency is key to optimizing the user experience on the web.
Software Defined Networks
By: Thierry Couture, Consulting Systems Architect
There is currently a lot of buzz around OpenFlow and Software Defined Networks (SDN) in the industry. It would be a mistake to think that these are one and the same. The reality is that the current market conversation has loose semantics mixed in with hyperbole and hearsay that hide the simplicity of SDN behind terms like Openstack, Virtual Overlays, Network Function Virtualization, Orchestration, etc. This session will explain the origins of SDN, establish a basic terminology for SDN concepts, and offer a framework to both understand these trends and distill the applicability of SDN through a use case lens.
The document discusses various techniques for improving web caching performance, including collapsed forwarding, cache peering using protocols like ICP and HTCP, and using cache headers like stale-while-revalidate and stale-if-error. It also covers ways caching servers like Squid handle non-cacheable responses, such as ignoring certain cache control directives and authentication, and configuring refresh patterns.
Presentation given by Sanjaya, APNIC Deputy Director, at the the Indonesian Network Information Centre’s Open Policy Meeting (IDNIC OPM) held in Batam, Indonesia from 30 to 31 May 2016
This document is a transaction record from Deutsche Bank AG transferring funds to Ambank in Malaysia. It includes details of the transfer such as dates, amounts, bank and account details for both the sending and receiving parties. Deutsche Bank confirms the funds are clean and unencumbered by legal issues, and allows manual download of the transferred funds.
Big Data Week - L'impact du Big Data sur l'intelligence urbaine - FuturoCité ...Julie Roger
La ville intelligente: une réalité d’aujourd’hui au service des citoyens, des communautés et des autorités… grâce aux progrès technologiques Un facteur d’attractivité pour nos villes de demain ! Dans le cadre de la Big Data Week, Futurocité, et ses partenaires IBM et Mobistar ont présenté l’importance et l’utilité des données (Big data et analytics) pour offrir de nouveaux services "Smart city". Comment et pourquoi est-il important d’exploiter, de traiter et d’analyser les données afin de fournir une information cohérente et intelligente pour une gouvernance plus efficace au bénéfice du bien-être des citoyens et la prospérité de leurs communauté.
Supporting Real-time Traffic: Preparing Your IP Network for ...Videoguy
This document provides guidance on preparing an IP network to support real-time video conferencing traffic. It discusses how real-time traffic differs from typical data traffic in terms of bandwidth utilization and sensitivity to delay and packet loss. It recommends implementing Quality of Service (QoS) using Differentiated Services across the network to prioritize real-time traffic. The document also covers classifying and managing bandwidth demand, and testing and monitoring the network to support video conferencing.
This document provides a summary of common Linux network tools including ifconfig, netstat, route, ping, traceroute, iptables, netcat, rinetd, tcpdump, and tcpreplay. It describes what each tool is used for at a high level, such as configuring network interfaces, displaying network status, manipulating network routes, testing network connectivity, implementing firewalls, and capturing/replaying network traffic. The document also provides basic introductions to IPv4 and IPv6 addressing and routing concepts.
The document is a confirmation email registering the recipient for four upcoming webinars on 5G topics. It provides the title, date, start time, duration, and URL for each webinar, which will cover 5G network ecosystem and services, operator and cloud provider collaboration, edge infrastructure and services, and network slicing and service management automation.
The document discusses network security topics including SIEM, logs, NetFlow, web logs, and security standards. It provides examples of configuring Cisco routers to collect NetFlow data and export it to a SIEM system. Splunk and HP ArcSight are mentioned as examples of SIEM systems that can aggregate and correlate log data from various sources for security monitoring, analysis, and incident response.
The document provides an overview of network security topics including SIEM, logs, NetFlow, web logs, and compliance standards. It discusses how SIEM systems aggregate and correlate log/event data from multiple sources to provide security monitoring, incident response, forensic analysis and compliance reporting capabilities. Specific topics covered include syslog, NetFlow for network monitoring, and examples of web server logs and the types of data that can be extracted from logs for security purposes. Compliance standards like PCI-DSS and SOX are also mentioned in relation to why log collection and monitoring is important for audit requirements.
DIY Internet: Snappy, Secure Networking with MinimaLT (JSConf EU 2013)Igalia
By Andy Wingo.
Refreshing your Twitter feed is such a drag over 3G, taking forever to connect and fetch those precious kilobytes. The reasons for this go deep into the architecture of the internet: making an HTTPS connection simply has terrible latency.
So let’s fix the internet! MinimaLT is an exciting new network protocol that connects faster than TCP, is more secure than TLS (crypto by DJ Bernstein), and allows mobile devices to keep connections open as they change IP addresses. This talk presents the MinimaLT protocol and a Node library that allows JS hackers to experimentally build a new Internet.
OSTU - Sake Blok on Packet Capturing with TsharkDenny K
Sake Blok, a Wireshark/Ethereal devotee since 1999, works as a Research & Development Engineer for ion-ip in the Netherlands (http://www.ionip.com) . His company provides solutions to customers who want to deliver their applications to users in a fast, secure, efficient and scalable manner. Sake\'s main focus is to take new products for a spin in their test environment, design custom solutions for customers and troubleshoot the problems customers might encounter while using ion-ip solutions. Two years ago (2006), Sake started to add the functionality he was missing to Wireshark. He also started to fix Wireshark-bugs that were reported on Bugzilla. This work on Wireshark resulted in an invitation from Gerald Combs to join the Wireshark Core Development Team in 2007.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
More Related Content
Similar to Can artificial intelligence secure your infrastructure
The document introduces SolarWinds Network Performance Monitor (NPM) software for network monitoring. It begins with an overview of SolarWinds as a company and network management basics. The presentation then demonstrates NPM's features for monitoring network routes, interfaces, multicast traffic and other performance data. It includes a demo of NPM and discusses licensing options. The document concludes with information on resources like trials, webinars, white papers and ROI tools to learn more about NPM network monitoring capabilities.
The document is a presentation about SolarWinds Network Performance Monitor (NPM) software. It introduces NPM, which provides automated network monitoring and alerts. The presentation covers new features in NPM version 10.5 like route and multicast monitoring. It also includes a customer case study, demo of the software, and discusses licensing costs and ROI. Attendees are encouraged to ask questions.
Enhancing Computer Security via End-to-End Communication Visualization amiable_indian
This document summarizes a novel computer security visualization tool called Network Eye. Network Eye provides an end-to-end visualization of network and host activities to help analyze computer security. It integrates multiple views, including network maps, host views, and packet traces. The tool could help reduce training costs and make administrators more effective. Partnerships between academia and businesses are discussed to jointly develop Network Eye further.
Handy Networking Tools and How to Use ThemSneha Inguva
Linux networking tools can be used to analyze network connectivity and performance. Tools like ifconfig show interface configurations, route displays routing tables, arp shows the ARP cache, dig/nslookup resolve DNS, and traceroute traces the network path. Nmap scans for open ports, ping checks latency, and tcpdump captures traffic. Iperf3 and wrk2 can load test throughput and capacity, while tcpreplay replays captured traffic. These CLI tools provide essential network information and testing capabilities from the command line.
Cloudflare lower network latency = faster website loadsVu Long Tran
Lower network latency = faster website loads
This document discusses how lower network latency leads to faster website loads. It explains that round-trip time (RTT) measures the time for a network request to travel from one point to another and back. Faster protocols like TLS 1.3 can reduce RTT by eliminating handshake rounds. Content delivery networks (CDNs) also reduce latency by caching content closer to users worldwide. New routing technologies like Cloudflare's Argo can improve performance by up to 59% by choosing network paths based on real-time conditions instead of following indirect routes. Overall, the document emphasizes that lower latency is key to optimizing the user experience on the web.
Software Defined Networks
By: Thierry Couture, Consulting Systems Architect
There is currently a lot of buzz around OpenFlow and Software Defined Networks (SDN) in the industry. It would be a mistake to think that these are one and the same. The reality is that the current market conversation has loose semantics mixed in with hyperbole and hearsay that hide the simplicity of SDN behind terms like Openstack, Virtual Overlays, Network Function Virtualization, Orchestration, etc. This session will explain the origins of SDN, establish a basic terminology for SDN concepts, and offer a framework to both understand these trends and distill the applicability of SDN through a use case lens.
The document discusses various techniques for improving web caching performance, including collapsed forwarding, cache peering using protocols like ICP and HTCP, and using cache headers like stale-while-revalidate and stale-if-error. It also covers ways caching servers like Squid handle non-cacheable responses, such as ignoring certain cache control directives and authentication, and configuring refresh patterns.
Presentation given by Sanjaya, APNIC Deputy Director, at the the Indonesian Network Information Centre’s Open Policy Meeting (IDNIC OPM) held in Batam, Indonesia from 30 to 31 May 2016
This document is a transaction record from Deutsche Bank AG transferring funds to Ambank in Malaysia. It includes details of the transfer such as dates, amounts, bank and account details for both the sending and receiving parties. Deutsche Bank confirms the funds are clean and unencumbered by legal issues, and allows manual download of the transferred funds.
Big Data Week - L'impact du Big Data sur l'intelligence urbaine - FuturoCité ...Julie Roger
La ville intelligente: une réalité d’aujourd’hui au service des citoyens, des communautés et des autorités… grâce aux progrès technologiques Un facteur d’attractivité pour nos villes de demain ! Dans le cadre de la Big Data Week, Futurocité, et ses partenaires IBM et Mobistar ont présenté l’importance et l’utilité des données (Big data et analytics) pour offrir de nouveaux services "Smart city". Comment et pourquoi est-il important d’exploiter, de traiter et d’analyser les données afin de fournir une information cohérente et intelligente pour une gouvernance plus efficace au bénéfice du bien-être des citoyens et la prospérité de leurs communauté.
Supporting Real-time Traffic: Preparing Your IP Network for ...Videoguy
This document provides guidance on preparing an IP network to support real-time video conferencing traffic. It discusses how real-time traffic differs from typical data traffic in terms of bandwidth utilization and sensitivity to delay and packet loss. It recommends implementing Quality of Service (QoS) using Differentiated Services across the network to prioritize real-time traffic. The document also covers classifying and managing bandwidth demand, and testing and monitoring the network to support video conferencing.
This document provides a summary of common Linux network tools including ifconfig, netstat, route, ping, traceroute, iptables, netcat, rinetd, tcpdump, and tcpreplay. It describes what each tool is used for at a high level, such as configuring network interfaces, displaying network status, manipulating network routes, testing network connectivity, implementing firewalls, and capturing/replaying network traffic. The document also provides basic introductions to IPv4 and IPv6 addressing and routing concepts.
The document is a confirmation email registering the recipient for four upcoming webinars on 5G topics. It provides the title, date, start time, duration, and URL for each webinar, which will cover 5G network ecosystem and services, operator and cloud provider collaboration, edge infrastructure and services, and network slicing and service management automation.
The document discusses network security topics including SIEM, logs, NetFlow, web logs, and security standards. It provides examples of configuring Cisco routers to collect NetFlow data and export it to a SIEM system. Splunk and HP ArcSight are mentioned as examples of SIEM systems that can aggregate and correlate log data from various sources for security monitoring, analysis, and incident response.
The document provides an overview of network security topics including SIEM, logs, NetFlow, web logs, and compliance standards. It discusses how SIEM systems aggregate and correlate log/event data from multiple sources to provide security monitoring, incident response, forensic analysis and compliance reporting capabilities. Specific topics covered include syslog, NetFlow for network monitoring, and examples of web server logs and the types of data that can be extracted from logs for security purposes. Compliance standards like PCI-DSS and SOX are also mentioned in relation to why log collection and monitoring is important for audit requirements.
DIY Internet: Snappy, Secure Networking with MinimaLT (JSConf EU 2013)Igalia
By Andy Wingo.
Refreshing your Twitter feed is such a drag over 3G, taking forever to connect and fetch those precious kilobytes. The reasons for this go deep into the architecture of the internet: making an HTTPS connection simply has terrible latency.
So let’s fix the internet! MinimaLT is an exciting new network protocol that connects faster than TCP, is more secure than TLS (crypto by DJ Bernstein), and allows mobile devices to keep connections open as they change IP addresses. This talk presents the MinimaLT protocol and a Node library that allows JS hackers to experimentally build a new Internet.
OSTU - Sake Blok on Packet Capturing with TsharkDenny K
Sake Blok, a Wireshark/Ethereal devotee since 1999, works as a Research & Development Engineer for ion-ip in the Netherlands (http://www.ionip.com) . His company provides solutions to customers who want to deliver their applications to users in a fast, secure, efficient and scalable manner. Sake\'s main focus is to take new products for a spin in their test environment, design custom solutions for customers and troubleshoot the problems customers might encounter while using ion-ip solutions. Two years ago (2006), Sake started to add the functionality he was missing to Wireshark. He also started to fix Wireshark-bugs that were reported on Bugzilla. This work on Wireshark resulted in an invitation from Gerald Combs to join the Wireshark Core Development Team in 2007.
Similar to Can artificial intelligence secure your infrastructure (20)
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Can artificial intelligence secure your infrastructure
1. Can Artificial Intelligence Secure your
Infrastructure ‘?’
A. S. M. Shamim Reza
Deputy Manager, NOC
Link3 Technologies Limited
October 28 - 30, 2019
Lyon Convention Centre
Lyon, France
2. I propose to consider the question,
"Can machines think?"
– Alan Turing 1950
’’
3. [~]$whoami|– 10+ years, working for Link3 Technologies Limited
– InfoSec Professional
EC-Council Certified Security Analyst
sohag.shamim@gmail.com
shamimreza@link3.net
@asmshamimreza on Linkedin
@shamimrezasohag on Twitter
4. Agenda
The research work was basically motivated to detect Anomaly in DNS traffic, from
NetFlow data, incorporating Machine Learning models.
– What type of attacks that use DNS protocol or target DNS server?
– What are the characteristics of these attacks?
– How NetFlow conversation tells a story?
– Which Machine Learning model helps detect these attacks and how ?
– Which Machine Learning model can detect strange activities ?
– How the data has to be processed ?
– How the ML Model can be build ?
5. “it is better and more useful to meet a
problem in time than to seek a remedy after
the damage is done”
A Latin Proverb, found in 13th
century ‘‘
‘‘
6. DNS NetFlow Anomaly
“The Domain Name
Server (DNS) is the
Achilles heel of the Web.
The important thing is
that it's managed
responsibly.”
“It is network protocol
developed by Cisco to
collect and monitor
network traffic flow data.”
“Anomaly detection, is the
process of finding data
objects with behaviors that
are very different from
expectation. Such objects
are called anomalies.”
Tim Berners-Lee wikipedia Data Mining. Concepts and
Techniques by Jiawei Han
8. Attack Definition in-terms of NetFlow
Attacks that Target DNS Server
Recursive Query Attacks
Cache Poisoning Attacks
Buffer overflow
Port Scan
Attacks using DNS server –
Reflection Attack
DNS Tunneling
– It is very hard, to give an conclusive answer for a particular attack, if the amount of captured packet is low.
– For DoS, the amount of traffic can be high, but if the DNS package is vulnerable, only few packets can perform a
successful attack.
– Increase amount of query request to a DNS server can be counted as, Cache-Poisoning
– Buffer overflow, packet size have to be large enough and it has to be manipulated
– But, a small amount of packet info can show that port scan is running
– DNS tunneling, could be the same as DoS, due to increase of traffic, larger packets, and using of BASE64 & BASE32
character encoding, generate the queries which is strange.
9. Attack Definition in-terms of NetFlow
Attacks that Target DNS Server
Recursive Query Attacks
Cache Poisoning Attacks
Buffer overflow
Port Scan
Attacks using DNS server –
Reflection Attack
DNS Tunneling
Amplification Attack
(proto udp and src port 53 and packets > 1000) or (proto udp and src port 53 and packets < 1001
and bytes > 900 )
Scan Attack/Scanning proto tcp and dst port 53 and packets => 1 and bytes == 58
Zero DNS Packet proto udp/tcp and src/dst port 53 and packets => 1 and bytes <= 40
Acting Open-resovlers proto udp/tcp and src port 53 and packets => 1 and bytes == 45
DNS Tunneling proto udp/tcp and src/dst port 53 and packets => 1 and (151 => bytes <= 160)
16. Classification of Anomaly in Netflow
Point Anomaly – A data point that differ from the rest of
the data set
Contextual Anomaly – If an observation is strange because
of the context of the observation.
Collective Anomaly – A collection of data instances are
strange in observation.
17. One should look for a possible alternative, and
provide against it.
It is the first rule of criminal investigation.
Sherlock Holmes, the adventure of black peter
18. The Process for AI – standard procedure
Data
Collection
Data
Processing
The
Algorithm
Training
& Testing
Evaluation
– Data quality &
quantity is very
important
– we have collected
data from 109 routers
– the volume is about
75TB
(though we have worked
only on few selected events)
19. The Process for AI – standard procedure
Data
Collection
Data
Processing
The
Algorithm
Training
& Testing
Evaluation
– Data quality &
quantity is very
important
– we have collected
data from 109 routers
– the volume is about
75TB
(though we have worked
only on few selected events)
– manual processed the
data
– randomized it
– labeled it accordingly
– used libraries to
normalize & label
20. The Process for AI – standard procedure
Data
Collection
Data
Processing
The
Algorithm
Training
& Testing
Evaluation
– Data quality &
quantity is very
important
– we have collected
data from 109 routers
– the volume is about
75TB
(though we have worked
only on few selected events)
– manual processed the
data
– randomized it
– labeled it accordingly
– used libraries to
normalize & label
– as our work is to
predict, kind of
YES/NO, so we moved
for Supervised learning
and choose classification
algorithm
21. The Process for AI – standard procedure
Data
Collection
Data
Processing
The
Algorithm
Training
& Testing
Evaluation
– Data quality &
quantity is very
important
– we have collected
data from 109 routers
– the volume is about
75TB
(though we have worked
only on few selected events)
– manual processed the
data
– randomized it
– labeled it accordingly
– used libraries to
normalize & label
– as our work is to
predict, kind of
YES/NO, so we moved
for Supervised learning
and choose classification
algorithm
– started with small
amount of data
– it has 5k of event
– split it for test & train
– again used 80K event
sample for test & train
22. The Process for AI – standard procedure
Data
Collection
Data
Processing
The
Algorithm
Training
& Testing
Evaluation
– Data quality &
quantity is very
important
– we have collected
data from 109 routers
– the volume is about
75TB
(though we have worked
only on few selected events)
– manual processed the
data
– randomized it
– labeled it accordingly
– used libraries to
normalize & label
– as our work is to
predict, kind of
YES/NO, so we moved
for Supervised learning
and choose classification
algorithm
– started with small
amount of data
– it has 5k of event
– split it for test & train
– again used 80K event
sample for test & train
– primarily worked with
a single algorithm
– for anomaly check,
tested two different
algorithm
– tested with define
sample % with different
contamination ratio
23. The Process for AI – standard procedure, “strategies to collect data”
How is the data ?
– The data has multiple features
– Too much noise is there
24. The Process for AI – standard procedure, “strategies to collect data”
How is the data ?
– The data has multiple features
– Too much noise is there
So how we start working on the DATA ?
– Separated Open DNS Resolver to reduce data
noise
– Collected the data that are directed to Anycast
DNS server
– Collected the data that are directed to DNS
server outside of link3 cloud
– Started with “GO with the Book” formula
26. The Process for AI – standard procedure. “formatted data”
7.04,4,280,0,318,70,1,normalDNSFlow
7.04,4,280,0,318,70,1,normalDNSFlow
7.05,4,276,0,313,69,1,normalDNSFlow
7.05,4,276,0,313,69,1,normalDNSFlow
7.04,4,280,0,318,70,1,normalDNSFlow
7.04,4,280,0,318,70,1,normalDNSFlow
7.05,4,304,0,344,76,1,normalDNSFlow
7.05,4,304,0,344,76,1,normalDNSFlow
214.29,98,6475,0,241,66,1,dnsflood
226.31,173,10892,0,385,62,1,dnsflood
99.86,73,4560,0,365,62,1,dnsflood
63.67,129,8750,2,1099,67,1,dnsflood
222,18661,1.4M,84,50696,75,1,dnsflood
959.21,66129,5.8M,68,48762,88,1,dnsflood
437.5,84479,71.1M,193,1.3M,841,1,dnsflood
1712.01,99985,73.8M,58,344723,737,1,dnsflood
556.03,81319,81.7M,146,1.2M,1004,1,dnsflood
1799.38,250593,258.3M,139,1.1M,1030,1,dnsflood
481.86,90507,93.0M,187,1.5M,1027,1,dnsflood
27. The Process for AI – algorithm we choose to work with
Density Based
– DBSCAN
– LOF
Distance Based
– K-NN
– K-MEANS
Parametric
– GMM
– One-Class SVMs
Probabilistic
– Angle-Based Outlier Detection (ABOD)
– FastABOD
Proximity
– Proximity-based techniques define a data
point as an anomaly when its locality is
sparsely populated or very small in amount.
28. The Process for AI – algorithm we choose to work with
Density Based
– DBSCAN
– LOF
Distance Based
– K-NN
– K-MEANS
Parametric
– GMM
– One-Class SVMs
Probabilistic
– Angle-Based Outlier Detection (ABOD)
– FastABOD
We choose to work on Supervised Learning
– In Supervised Learning, data has to be labeled
– Training data set has to be there
Proximity
29. The Process for AI – algorithm we choose to work with, kNN
Density Based
– DBSCAN
– LOF
Distance Based
– K-NN
– K-MEANS
Parametric
– GMM
– One-Class SVMs
Probabilistic
– Angle-Based Outlier Detection (ABOD)
– FastABOD
For an observation, its distance to its kth nearest neighbor
could be viewed as the outlying score. It could be viewed as a
way to measure the density
method='largest',
algorithm='auto',
metric='minkowski',
p=2 (euclidean distance)
30. The Process for AI – algorithm we choose to work with, ABOD
Density Based
– DBSCAN
– LOF
Distance Based
– K-NN
– K-MEANS
Parametric
– GMM
– One-Class SVMs
Probabilistic
– Angle-Based Outlier Detection (ABOD)
– FastABOD
ABOD class for Angle-base Outlier Detection. For an
observation, the variance of its weighted cosine scores to all
neighbors could be viewed as the outlying score.
method='fast'
31. The Process for AI – GO with the Book; “working with ML Model”
Data
Normalization
has already
been done
Now Let’s
train &
test the
Model
32. The Process for AI – GO with the Book; “the test statistics”
10%
30%
20
%
40%
20%
33. “PyOD (Python Outlier Detection) is a comprehensive and scalable Python toolkit for
detecting outlying objects in multivariate data.”
– Zhao, Yue and Nasrullah, Zain and Li, Zheng
The Process for AI – “working with PyOD”
34. “PyOD (Python Outlier Detection) is a comprehensive and scalable Python toolkit for
detecting outlying objects in multivariate data.”
– Zhao, Yue and Nasrullah, Zain and Li, Zheng
** For Anomaly Detection, data was not manually labeled, rather the raw data has been used.
The Process for AI – “working with PyOD”
35. “PyOD (Python Outlier Detection) is a comprehensive and scalable Python toolkit for
detecting outlying objects in multivariate data.”
– Zhao, Yue and Nasrullah, Zain and Li, Zheng
** For Anomaly Detection, data was not manually labeled, rather the raw data has been used.
The Process for AI – “working with PyOD”
Imported modules
36. The Process for AI – “working with PyOD, kNN”
Sample data – 10%
Contamination – 5%, 10%, 20%
37. The Process for AI – “working with PyOD, ABOD”
Sample data – 10%
Contamination – 5%, 10%, 20%
40. The Process for AI – “What are the lessons?”
– The difference between noise and anomalies
– Noise often outnumber anomalies
– How to make use of label information/domain knowledge when available
– Fast runtime: can scale up to large datasets and high dimensional datasets
– Known behaviors under different data properties
– Can deal with different types of anomalies
– Its ability to deal with high dimensional problems
41. Reference
• Hoai-Vu Nguyen and Yongsun Choi, “Proactive Detection of DDoS Attacks Utilizing k-NN Classifier in an Anti-DDos Framework ” (World Academy of
Science, Engineering and Technology, International Journal of Computer and Information Engineering, Vol:4, No:3, 2010 )
• Przemys ław Berezinski, Bartosz Jasiul, and Marcin Szpyrka; “ aw Berezinski, Bartosz Jasiul, and Marcin Szpyrka; “ An Entropy-Based Network Anomaly
Detection Method”, Entropy 2015, 17, 2367-2408 (https://www.mdpi.com/journal/entropy)
• Das, S., Islam, M.R., Jayakodi, N.K. and Doppa, J.R., “Active Anomaly Detection via Ensembles: Insights, Algorithms, and Interpretability.” arXiv
preprint arXiv:1901.08930. 2019.
• Ro, K., Zou, C., Wang, Z. and Yin, G., “Outlier detection for high-dimensional data. Biometrika” 2015.
• Suri, N.R. and Athithan, G., “Research Issues in Outlier Detection. In Outlier Detection: Techniques and Applications”, 2019.
• Milan Cermak, Pavel Celeda, Jan Vykopal, “Detection of DNS Traffic Anomalies in Large Networks”, Masaryk University, 2014 (
https://www.researchgate.net/publication/290882059)
• Zdrnja, B., Brownlee, N., Wessels, D., “Passive Monitoring of DNS Anomalies. In:Detection of Intrusions and Malware, and Vulnerability Assessment”,
2007.
• Manasrah, A.M., Hasan, A., Abouabdalla, O.A., Ramadass, S., “Detecting Bot-net Activities Based on Abnormal DNS Traffic”, (IJCSIS) International
Journal of Computer Science & Information Security, 2009.
• Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96),
pp.1-7.
• Takashi Washio, Osaka University; IEEE International Conference on Data Mining, Singapore "Which Anomaly Detector should I use?", 2018
• LAKSHAY ARORA, NCU (THE NORTHCAP UNIVERSITY);Anomaly detection using PyOD, Analytics Vidhaya, 2019