Explore, Analyze and Visualize Data in Hadoop and NoSQL. Make massive quantities of machine data accessible, usable and valuable for the people who need it, at the speed they need it. Use Hunk to turn underutilized data into valuable insights in minutes, not weeks or months.
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
BlueData is working in partnership with Splunk to streamline and accelerate the deployment and adoption of Hunk: Splunk Analytics for Hadoop. The BlueData EPIC software platform now integrates Hunk with Hadoop clusters running on virtualized on-premises infrastructure.
Using Hunk with the BlueData EPIC platform, our joint customers can quickly provision virtual Hadoop clusters together with Hunk in a matter of minutes – providing their data scientists and analysts with the ability to rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop.
Learn more at http://www.bluedata.com
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
BlueData is working in partnership with Splunk to streamline and accelerate the deployment and adoption of Hunk: Splunk Analytics for Hadoop. The BlueData EPIC software platform now integrates Hunk with Hadoop clusters running on virtualized on-premises infrastructure.
Using Hunk with the BlueData EPIC platform, our joint customers can quickly provision virtual Hadoop clusters together with Hunk in a matter of minutes – providing their data scientists and analysts with the ability to rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop.
Learn more at http://www.bluedata.com
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies.
Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
In this presentation, we are going to talk about the state of the art infrastructure we have established at Walmart Labs for the Search product using Spark Streaming and DataFrames. First, we have been able to successfully use multiple micro batch spark streaming pipelines to update and process information like product availability, pick up today etc. along with updating our product catalog information in our search index to up to 10,000 kafka events per sec in near real-time. Earlier, all the product catalog changes in the index had a 24 hour delay, using Spark Streaming we have made it possible to see these changes in near real-time. This addition has provided a great boost to the business by giving the end-costumers instant access to features likes availability of a product, store pick up, etc.
Second, we have built a scalable anomaly detection framework purely using Spark Data Frames that is being used by our data pipelines to detect abnormality in search data. Anomaly detection is an important problem not only in the search domain but also many domains such as performance monitoring, fraud detection, etc. During this, we realized that not only are Spark DataFrames able to process information faster but also are more flexible to work with. One could write hive like queries, pig like code, UDFs, UDAFs, python like code etc. all at the same place very easily and can build DataFrame template which can be used and reused by multiple teams effectively. We believe that if implemented correctly Spark Data Frames can potentially replace hive/pig in big data space and have the potential of becoming unified data language.
We conclude that Spark Streaming and Data Frames are the key to processing extremely large streams of data in real-time with ease of use.
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
Snapshot of the hadoop ecosystem at the beginning of 2014, with the rise of real time and in memory processing distributed frameworks that complement and supplant the Map Reduce paradigm
Spark Application Carousel: Highlights of Several Applications Built with SparkDatabricks
This talk from 2015 Spark Summit East covers 3 applications built with Apache Spark:
1. Web Logs Analysis: Basic Data Pipeline - Spark & Spark SQL
2. Wikipedia Dataset Analysis: Machine Learning
3. Facebook API: Graph Algorithms
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
Wei Li of Alibaba
Track 2: Ecology and Solutions
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/
Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London.
TUMRA were early adopters of Spark after a brief PoC in Dec ‘12 and took it to production just a few months later. The main motivation to do so was the inflexibility and high-latency of Hadoop Map/Reduce jobs and the knock-on effect for technology that utilises it (Mahout machine learning, Hive data warehousing, Cascading).
With two primary uses case ‘Ecommerce Personalisation’ and ‘Marketing Automation’ TUMRA are currently flowing around 29 million ‘user engagement events’ (JSON) each day through Apache Kafka and Spark Streaming at peak rates of up to 800 events per second.
TUMRA use Apache Spark on Amazon Web Services (EC2) in production for a mix of machine learning model building, graph analytics and near-real-time reporting.
To learn more about how we use Spark and the services we can deliver through our Platform please contact: hello@tumra.com
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
Talk held at the IT-Stammtisch Darmstadt on 08.11.2013
Agenda:
- What is Big Data & Hadoop?
- Core Hadoop
- The Hadoop Ecosystem
- Use Cases
- What‘s next? Hadoop 2.0!
http://bit.ly/1BTaXZP – As organizations look for even faster ways to derive value from big data, they are turning to Apache Spark is an in-memory processing framework that offers lightning-fast big data analytics, providing speed, developer productivity, and real-time processing advantages. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark Streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. By the end of the session, you’ll come away with a deeper understanding of how you can unlock deeper insights from your data, faster, with Spark.
Accelerate Hadoop and Spark deployment in a multi-tenant lab environment for dev/test/QA, evaluation of multiple tools for Big Data analytics, and other use cases. BlueData provides a turnkey on-premises solution with software and services to get up and running in two weeks.
The new Big Data Lab Accelerator solution provides a full enterprise license of BlueData EPIC software along with the professional services needed to deploy an on-premises multi-tenant Big Data lab. Within two weeks, customers will have a lab environment to evaluate Big Data tools and spin up multiple Hadoop or Spark clusters for development, testing and quality assurance. As part of this deployment, BlueData will also work with customers to implement initial use cases for Big Data analytics.
Learn more about BlueData at www.bluedata.com
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
This session will give a new dimension to Apache Spark’s usage. See how Apache Spark and other open source projects can be used together in providing a scalable, real-time monitoring system. Apache Spark plays the central role in providing this scalable solution, since without Spark Streaming we would not be able to process millions of events in real time. This approach can provide a lot of learning to the DevOps/Infrastructure domain on how to build a scalable and automated logging and monitoring solution using Apache Spark, Apache Kafka, Grafana and some other open-source technologies.
Sony PlayStation’s monitoring pipeline processes about 40 billion events every day, and generates metrics in near real-time (within 30 seconds). All the components, used along with Apache Spark, are horizontally scalable using any auto-scaling techniques, which enhances the reliability of this efficient and highly available monitoring solution. Sony Interactive Entertainment has been using Apache Spark, and specifically Spark Streaming, for the last three years. Hear about some important lessons they have learned. For example, they still use Spark Streaming’s receiver-based method in certain use cases instead of Direct Streaming, and will share the application of both the methods, giving the knowledge back to the community.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
In this presentation, we are going to talk about the state of the art infrastructure we have established at Walmart Labs for the Search product using Spark Streaming and DataFrames. First, we have been able to successfully use multiple micro batch spark streaming pipelines to update and process information like product availability, pick up today etc. along with updating our product catalog information in our search index to up to 10,000 kafka events per sec in near real-time. Earlier, all the product catalog changes in the index had a 24 hour delay, using Spark Streaming we have made it possible to see these changes in near real-time. This addition has provided a great boost to the business by giving the end-costumers instant access to features likes availability of a product, store pick up, etc.
Second, we have built a scalable anomaly detection framework purely using Spark Data Frames that is being used by our data pipelines to detect abnormality in search data. Anomaly detection is an important problem not only in the search domain but also many domains such as performance monitoring, fraud detection, etc. During this, we realized that not only are Spark DataFrames able to process information faster but also are more flexible to work with. One could write hive like queries, pig like code, UDFs, UDAFs, python like code etc. all at the same place very easily and can build DataFrame template which can be used and reused by multiple teams effectively. We believe that if implemented correctly Spark Data Frames can potentially replace hive/pig in big data space and have the potential of becoming unified data language.
We conclude that Spark Streaming and Data Frames are the key to processing extremely large streams of data in real-time with ease of use.
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
Snapshot of the hadoop ecosystem at the beginning of 2014, with the rise of real time and in memory processing distributed frameworks that complement and supplant the Map Reduce paradigm
Spark Application Carousel: Highlights of Several Applications Built with SparkDatabricks
This talk from 2015 Spark Summit East covers 3 applications built with Apache Spark:
1. Web Logs Analysis: Basic Data Pipeline - Spark & Spark SQL
2. Wikipedia Dataset Analysis: Machine Learning
3. Facebook API: Graph Algorithms
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
Wei Li of Alibaba
Track 2: Ecology and Solutions
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/
Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London.
TUMRA were early adopters of Spark after a brief PoC in Dec ‘12 and took it to production just a few months later. The main motivation to do so was the inflexibility and high-latency of Hadoop Map/Reduce jobs and the knock-on effect for technology that utilises it (Mahout machine learning, Hive data warehousing, Cascading).
With two primary uses case ‘Ecommerce Personalisation’ and ‘Marketing Automation’ TUMRA are currently flowing around 29 million ‘user engagement events’ (JSON) each day through Apache Kafka and Spark Streaming at peak rates of up to 800 events per second.
TUMRA use Apache Spark on Amazon Web Services (EC2) in production for a mix of machine learning model building, graph analytics and near-real-time reporting.
To learn more about how we use Spark and the services we can deliver through our Platform please contact: hello@tumra.com
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
Talk held at the IT-Stammtisch Darmstadt on 08.11.2013
Agenda:
- What is Big Data & Hadoop?
- Core Hadoop
- The Hadoop Ecosystem
- Use Cases
- What‘s next? Hadoop 2.0!
http://bit.ly/1BTaXZP – As organizations look for even faster ways to derive value from big data, they are turning to Apache Spark is an in-memory processing framework that offers lightning-fast big data analytics, providing speed, developer productivity, and real-time processing advantages. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark Streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. By the end of the session, you’ll come away with a deeper understanding of how you can unlock deeper insights from your data, faster, with Spark.
Accelerate Hadoop and Spark deployment in a multi-tenant lab environment for dev/test/QA, evaluation of multiple tools for Big Data analytics, and other use cases. BlueData provides a turnkey on-premises solution with software and services to get up and running in two weeks.
The new Big Data Lab Accelerator solution provides a full enterprise license of BlueData EPIC software along with the professional services needed to deploy an on-premises multi-tenant Big Data lab. Within two weeks, customers will have a lab environment to evaluate Big Data tools and spin up multiple Hadoop or Spark clusters for development, testing and quality assurance. As part of this deployment, BlueData will also work with customers to implement initial use cases for Big Data analytics.
Learn more about BlueData at www.bluedata.com
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
This talk, given at Berlin Buzzwords 2019, describes the recent progress in making Hopsworks a cloud-native platform, with HA data-center support added for HopsFS.
Strata Singapore 2017 business use case section
"Big Telco Real-Time Network Analytics"
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/62797
Presentation detailed about capabilities of In memory Analytic using Apache Spark. Apache Spark overview with programming mode, cluster mode with Mosos, supported operations and comparison with Hadoop Map Reduce. Elaborating Apache Spark Stack expansion like Shark, Streaming, MLib, GraphX
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...Splunk
.conf Go 2023 presentation:
"Das passende Rezept für die digitale (Security) Revolution zur Telematik Infrastruktur 2.0 im Gesundheitswesen?"
Speaker: Stefan Stein -
Teamleiter CERT | gematik GmbH M.Eng. IT-Sicherheit & Forensik,
doctorate student at TH Brandenburg & Universität Dresden
.conf Go 2023 presentation:
De NOC a CSIRT
Speakers:
Daniel Reina - Country Head of Security Cellnex (España) & Global SOC Manager Cellnex
Samuel Noval - Global CSIRT Team Leader, Cellnex
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk
BMW is defining the next level of mobility - digital interactions and technology are the backbone to continued success with its customers. Discover how an IT team is tackling the journey of business transformation at scale whilst maintaining (and showing the importance of) business and IT service availability. Learn how BMW introduced frameworks to connect business and IT, using real-time data to mitigate customer impact, as Michael and Mark share their experience in building operations for a resilient future.
Data foundations building success, at city scale – Imperial College LondonSplunk
Universities have more in common with modern cities than traditional places of learning. This mini city needs to empower its citizens to thrive and achieve their ambitions. Operationalising data is key to building critical services; from understanding complex IT estates for smarter decision-making to robust security and a more reliable, resilient student experience. Juan will share his experience in building data foundations for a resilient future whilst enabling digital transformation at Imperial College London.
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk
Learn how Vodafone has provided end-to-end visibility across services by building an Operational Analytics Platform. In this session, you will hear how Stefan and his team manage legacy, on premise, hybrid and public cloud services, and how they are providing a platform for complex triage and debugging to tackle use cases across Vodafone’s extensive ecosystem.
.italo operates an Essential Service by connecting more than 100 million people annually across Italy with its super fast and secure railway. And CISO Enrico Maresca has been on a whirlwind journey of his own.
Formerly a Cyber Security Engineer, Enrico started at .italo as an IT Security Manager. One year later, he was promoted to CISO and tasked with building out – and significantly increasing the maturity level – of the SOC. The result was a huge step forward for .italo.
So how did he successfully achieve this ambitious ask? Join Enrico as he reveals the key insights and lessons learned in his SOC journey, including:
Top challenges faced in improving security posture
Key KPIs implemented in order to measure success
Strategies and approaches applied in the SOC
How MITRE ATT&CK and Splunk Enterprise Security were utilised
Next steps in their maturity journey ahead
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
2. Splunk
Disruptive Approach to Unstructured Data
Structured
RDBMS
SQL Search
Schema at Write Schema at Read
1980-2010 2010+
ETL Universal
Indexing
Unstructured
Volume | Velocity | Variety
3. Mainframe
Data
VMware
Platform for Machine Data
Exchange PCI Security
DB Connect MobileForwarders
Syslog,
TCP,
Other
Sensors,
Control
Systems
600+ Ecosystem of Apps
Stream
SPLUNK TODAY
5. 5
Distributed File System
(semi-structured)
Key/Value, Columnar or
Other (semi-structured)
Relational Database
(highly structured)
MapReduce
Cassandra
Accumulo
MongoDB
Splunk - Big Data Technologies
SQL &
MapReduce
NoSQL
Temporal, Unstructured
Heterogeneous
Hadoop
RDBMS HDFS Storage +
MapReduce
Real-Time Indexing
5
Oracle
MySQL
IBM DB2
Teradata
6. Massive Linear Scalability to Tens of TBs/Day
Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day
Offload search load to Splunk Search Heads
6
Automatic load balancing linearly
scales indexing
Distributed search and MapReduce
linearly scales search and reporting
7. 7
Splunk Real-Time Analytics
Data
ParsingQueue
Parsing Pipeline
• Source, event typing
• Character set
normalization
• Line breaking
• Timestamp identification
• Regex transforms
Indexing
Pipeline
Real-time
Buffer
Raw data
Index Files
Real-time
Search
Process
Monitor Input
IndexQueue
TCP/UDP Input
Scripted Input
Splunk
Index
7
8. 8
Search Head Clustering
Ability to group search heads into a cluster in order to provide
Highly Available and Scalable search services = Thousands of Users
8
MISSION
CRITICAL
ENTERPRISE
9. 9
Splunk Index Replication – High Availability
9
2
Master asks the redundant
peer to act as primary
3
Peers copies the search
files / index files / raw data
2 3
1
Master auto-detects that a
peer is down
1
• Default is 3X Replication
11. 11
Splunk and Hadoop
1
Hunk:
– Main use case = Analyze Hadoop Data using Hadoop Processing
Splunk Hadoop Connect:
– Main use case = Real-time export data from Splunk to Hadoop
Hunk Archive
– Main use case = Archive Splunk indexers to Hadoop
Splunk HadoopOps:
– Main use case = Monitor Hadoop
13. 13
Hunk – Unique
1
1. Run Natively in Hadoop:
– Use Hadoop MapReduce
2. Mixed Mode:
– Allows for data Preview
3. Auto deploy SplunkD to DataNodes:
– On the fly Indexing
4. Access Control:
– Allows for many users / many Hadoop directories / support Kerberos
5. Schema On the Fly
14. 14
Run Natively in Hadoop
External resource
(e.g. hadoop.prod)
MapReduce
jobs
Tasks
/ working
directory
Index on data nodes
Hunk
search head >
1
5
3
4
2
NameNode
JobTracker
(YARN)
DataNode /
TaskTracker
DataNode /
TaskTracker
DataNode /
TaskTracker
HDFS
14
Hadoop
MR Jobs
15. 15
Mixed-mode Search
15
Time
Hadoop MR /
Splunk Index
Splunk Stream
Switch over
time
preview
preview
• Data Preview
• Allows users to search interactively by pausing and
refining queries
16. 16
Indexing On the fly - Hunk Data Processing
16
HDFS
Results
Final search
results
ERP
Search process
Remote results Remote results
Search head
MapReduce
Search process
TaskTracker
raw
preprocessed
Remote results
Remote results
17. 17 1
Role-based Security for Shared Clusters
Pass-through
Authentication
• Provide role-based security
for Hadoop clusters
• Access Hadoop resources
under security and
compliance
• Integrates with Kerberos
for Hadoop security
Business
Analyst
Marketing
Analyst
Sys
Admin
Business
Analyst
Queue:
Biz Analytics
Marketing
Analyst
Queue:
Marketing
Sys
Admin2
Queue:
Prod
18. 18
Managed Archiving Splunk Enterprise to Hunk-HDFS
1
• Archive buckets to Hadoop (HDFS) instead of freezing buckets or throwing data away
• Store old data up to 1/10 cheaper in Hadoop cheap batch storage instead of SANs
• Optimize Splunk Enterprise search head performance for real-time monitoring,
alerting and dashboarding with short-term historical context
• Hunk search, analyze and visualize months or years of historical data in Hadoop
• Run federated queries and dashboards across Splunk Enterprise and Hunk
Hadoop Clusters
WARM
COLD
FROZEN
20. 20
New Search
i ndex=" j obsummar y_l ogs_al l _r ed" cl ust er =" di l i t hi um* " | eval t ot al _sl ot _seconds=( m apSl ot Seconds + r educeSl ot Sec
onds) | eval gb_hour s=( ( t ot al _sl ot _seconds * 0. 5) / 3600) | eval gb_hour s=r ound( gb_h our s) | t i mechar t span=6h sum
( gb_hour s) as gb_hour s by queue
Last 7 days
✓ 1,175,726 events (5/20/ 14 8:00:00.000 PM to 5/ 27/14 8:26:26.000 PM)
200,000
400,000
600,000
_time ↕
OTH
ER
↕
apg_dai
lyhigh_
p3 ↕
apg_dail
ymedium
_p5 ↕
apg_hou
rlyhigh_
p1 ↕
apg_ho
urlylow_
p4 ↕
apg_hourl
ymedium
_p2 ↕
apg
_p7
↕
curveb
all_larg
e ↕
curveb
all_me
d ↕
sling
shot
↕
sling
stone
↕
Visualization
_time
Wed May 21
2014
Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26
Yahoo - Visualizing Hadoop
2
• 600PB of Data
• Very large clusters used by many
groups across the enterprise
• 35,000 individual Datanodes
• Hadoop is provided as a Self
Service
21. 21
Vantrix Mobile media optimization
2
144 Hadoop Nodes,
69 TB SSD Storage
Analytics Application
10 Million subscribers generate:
• 80GB of raw session log data / day
• 26 Million video data session records
Hunk Query
• 20 sec – search through 27M events
• Returning 4.7M events
Hunk as indexer - Automatically indexed and counted field value occurrences
Hunk as Self Service - Proved invaluable for identifying and exploring use cases
Hunk business value – Help identify when subscribers abandon video
But listening to your machine data isn’t as easy as it sounds.
Machine Data is different:
It is voluminous unstructured time series data with no predefined schema
It is generated by all IT systems– from servers and applications, to RFIDs and wire data.
It is non-standard data and characterized by many unpredictable and changing formats
Because of this, machine data cannot be managed using traditional approaches.
Traditional approaches require you to transform your data and force fit it into a brittle schema – They aren’t designed to handle the inconsistent machine data formats
Traditional approaches are designed with specific use cases and queries in mind – they limit the problems that you can solve
Traditional approaches rely on siloed tools that are designed for structured data approaches and legacy computing environments – They are inherently limited in their ability to scale
To listen to your machine data, you need a solution with no limits:
No limits on the formats of data
No limits on where you can collect the data from
No limits on the questions that you can ask and the use cases you can solve.
And no limits on scale.
You need a solution that can keep up with Machine Data.
Since then, Splunk has invested significantly to expand from a search tool to a mission-critical platform. The platform includes hundreds of data types and can scale to massive volumes
Today, it’s more than Splunk Enterprise, we’ve added Splunk Cloud, Hunk, Splunk MINT for mobile intelligence; and have more than 600 Apps.
Machine data is more than logs! It’s wire data, mainframe data, mobile device data, sensor data, metrics
Your use cases have evolved well beyond troubleshooting so we’re investing in solutions that leverage the power of Splunk Enterprise to provide you with packaged views into your data for faster, deeper insights.
Our most well-known solution is Splunk Enterprise Security and if you aren’t using it yet, we encourage you to find out why it’s turning the traditional SIEM market upside down.
How has big data evolved over time. For a long time, ‘big data’ was was simply a large database.
The database industry – in order to handle large data – moved to smaller databases, but many of them. Horizontal partitioning (Also known as Sharding) is a database design principle whereby rows of a database table are held separately (For example, A -> D in one database E -> H in a second database, etc ..)
Hadoop was introduced by Google and was adapted as the de-facto big data system. Hadoop is an open source project from Apache that has evolved rapidly into a major technology movement. It has emerged as a popular way to handle massive amounts of data, including structured and complex unstructured data. Its popularity is due in part to its ability to store and process large amounts of data effectively across clusters of commodity hardware. Apache Hadoop is not actually a single product but instead a collection of several components. For the most part, Hadoop is a batch oriented system.
** Teradata Aster Data & SQL on Hadoop are SQL interface systems that can talk to Hadoop
** Cassandra & HBase are NoSQL databases that can process data using a Key / Value in real-time.
Splunk = Temporal, Unstructured, Heterogeneous, real-time analytics platform.
Splunk allows you divide up the work of search and indexing across as many servers as you need to achieve the performance and scale you require. Using work dividing techniques such as MapReduce, Splunk can take a single search and query as many indexers as you need to complete the job, allowing you to use inexpensive commodity hardware in massively parallel clusters.
For example, if you had 1 million events to search, one Indexer can easily complete that search. But it will take a little time – let’s say 30 seconds. However, if the same million events was spread across 10 indexers, the same search would complete in 3 seconds. How fast and how large you want your searches is yours to control by adding indexers as desired.
For the most part, you can use monitor to add nearly all your data sources from files and directories. However, you might want to use upload to add one-time inputs, such as an archive of historical data. You can enable Splunk to accept an input on any TCP or UDP port. Splunk consumes any data sent on these ports. Use this method for syslog (default port is UDP 514), or set up netcat and bind to a port. TCP is the protocol underlying Splunk's data distribution and is the recommended method for sending data from any remote machine to your Splunk server. Splunk can index remote data from syslog-ng or any other application that transmits via TCP. However, there are times when you want to use scripts to feed data to Splunk for indexing, or prepare data from a non-standard source so Splunk can properly parse events and extract fields. You can use shell scripts, python scripts, Windows batch files, PowerShell, or any other utility that can format and stream the data that you want Splunk to index. You can stream the data to Splunk or write the data from a script to a file.
All data that comes into Splunk enters through the parsing pipeline as large chunks. During parsing, Splunk breaks these chunks into events which it hands off to the indexing pipeline, where final processing occurs. During both parsing and indexing, Splunk acts on the data, transforming it in various ways. Most of these processes are configurable, so you have the ability to adapt them to your needs.
To kick off a real-time search in Splunk Web, use the time range menu to select a preset Real-time time range window, such as 30 seconds or 1 minute. You can also specify a sliding time range window to apply to your real-time search. This defines a real-time buffer.
The Splunk Index is the repository for Splunk Enterprise data. Splunk Enterprise transforms incoming data into events, which it stores in indexes.
Faster Recovery II -
If you look at the screen – 2 indexers on the left with green cylinders – searchable copies of the data, 2 indexers on the right – only raw data
What happens when a peer goes down, master waits for hb timeout and marks the peer down.
<Cl>
Reassigns primaries to another peer. Then tries to enforce the replication policy, makes copies of the raw data and search files
In 5.0, search files are generated on each peer from the raw data, In 6.0, the search files are copied over from a peer that already has them instead of regenerating.
<Cl>
These statistics are from our internal tests...
Another point to note is generating search files from the rawdata is cpu intensive as compared to copying search files.
Quick to set-up, scales to multiple concurrent databases
Enrich machine data with structured data from relational databases
Execute database queries directly from the Splunk user interface
Browse and navigate database schemas and tables
Combine machine data with structured data from relational databases
Quick to set-up, scales to multiple concurrent databases
Enrich machine data with structured data from relational databases
Execute database queries directly from the Splunk user interface
Browse and navigate database schemas and tables
Combine machine data with structured data from relational databases
Search execution:
The Hunk Search head takes the list of content of directories in the virtual index. The search head filters directories & files based on the search & time range (partition pruning)
The NameNode and JobTracker (MapReduce Resource Manager in YARN) read data from MapReduce framework and feed it to search process. The process computes File Splits, constructs and submits the MapReduce jobs.
Hunk streams a few File Splits from HDFS and processes them in the Search Head to provider quick previews. The search head consumes and merges the MapReduce results (provide incremental previews) while the MapReduce jobs kick off.
The data nodes run a copy of splunkd to process the the jobs and write them to a working directory in HDFS.
Final results are stored in the Hunk search head.
Hunk utilizes the Splunk Search Processing Language, the industry-leading method to enable interactive data exploration across large, diverse data sets. There is no requirement to "understand" data up front. For customers of Splunk Enterprise, reuse your Search Processing Language knowledge and skill set for data stored in Hadoop. Any commands whose output depends on the event input order would yield different results – this is because Splunk guarantees events to be delivered in descending time order. Hunk doesn’t. This is the reason why transaction and localize do not work.
We can see the results from the intermediate Hadoop Map jobs getting steamed into the Splunk UI even before all the Map jobs are finished, and once all the Hadoop Maps are done processing the results, Splunk displays the full results.
In essence, Splunk acts as the Hadoop Reduce phase and there is no need to use Hadoop for that phase.
Hunk starts the streaming and reporting modes concurrently. Streaming results show until the reporting results come in. Allows users to search interactively by pausing and refining queries.
This is a major, unique advantage of Hunk compared to alternative approaches such as Hive or SQL on Hadoop which require fixed schema in an effort to speed up searches, while Hunk retains the combination of schema on the fly with results preview.
In this new feature, planned for release in the next Hunk release (version 6.2.1), archive buckets to Hadoop (the Hadoop Distributed File System, or HDFS) instead of freezing buckets or throwing data away. This significantly lowers the total cost of ownership (TCO) for Splunk Enterprise installations while giving security analysts, risk managers and marketers access to months or years of historical data integral for their job success.
Store old data up to 1/10 cheaper in Hadoop cheap batch storage instead of SANs
Optimize Splunk Enterprise search head performance for real-time monitoring, alerting and dashboarding with short-term historical context
Hunk search, analyze and visualize months or years of historical data in Hadoop
Run federated queries and dashboards across Splunk Enterprise and Hunk