About ExxonMobil and Geoffrey Martins
Why Shared Service?
The Four Major Challenges
Final Unified Network
Next Steps
Takeouts on how to build a successful Shared Service
Q&A
Splunk live! Inteligência operacional em um mundo de bigdataSplunk
This document discusses big data and machine data analytics. It describes how machine data from various sources like servers, security devices, sensors, and mobile devices can provide valuable insights but is challenging to manage and analyze at scale. The document promotes Splunk software's capabilities for ingesting, indexing, and analyzing large volumes of machine data from any source in real-time to provide operational intelligence and turn machine data into business value across use cases like IT operations, security, and analytics. It also advertises Splunk's upcoming annual user conference to showcase new capabilities for machine data analytics.
Deploying Splunk. Arquitetura e dimensionamento do SplunkSplunk
The document discusses architecting and sizing a Splunk deployment. It covers key factors to consider like data volume, search volume, and roles of servers in a distributed Splunk topology. Recommendations are provided around server configurations based on roles like indexer, search head, and forwarder. A reference server specification is also outlined for estimating hardware needs.
SplunkLive! São Paulo 2014 - Overview by markus zirnSplunk
1. The document discusses how Splunk software provides operational intelligence by collecting data from anywhere, allowing users to search and analyze everything, and gain real-time operational insights.
2. It highlights several Splunk customers and how they use Splunk across various industries and use cases such as IT operations, security, application management, and business analytics.
3. The document promotes Splunk's 5th Annual Worldwide User Conference in October 2014 with sessions, speakers, and opportunities to learn about Splunk's platform and ecosystem.
Second presentation in Savi's sponsoring of the Washington DC Spark Interactive. Discusses use of Spark with Drools to create expert systems-based analytics for the Internet of Things (IoT)
Life occurs in real-time, and not surprisingly, more solutions are being built using streaming technologies. Event-based architectures are becoming the norm, and customers are expecting immediate access to their data. This new world offers many exciting opportunities, but also some new challenges. What do you do when your streaming data is not complete? What if it relies on another data source? Does the dependent data exist yet, and does it come from a 3rd party? How do we merge a complete picture of data when data is sourcing from multiple places at the same time? A new norm in the world of distributed services. Join us as we dive deep into the technical details around these scenarios and more. Expect to learn about stream-stream joins, enriching stream data using local or remote data, and ways to anticipate and correct errors within the stream. Leave with a better understanding of managing data dependencies within a Spark Structured Streaming application.
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
The document discusses best practices for integrating machine learning models into production pipelines. It describes the full data science product lifecycle from identifying business needs to deploying models through APIs. Key aspects covered include maintainable code through functions/classes, unit testing and code reviews, using Jenkins and a tool called Apparate to schedule Spark jobs and automatically update libraries, and deploying APIs on Kubernetes through Spinnaker for continuous delivery. Lessons learned emphasize leveraging existing tools and infrastructure while addressing pain points to streamline the end-to-end process.
From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time” with Ema...Databricks
Emanuele Bardelli from OLX Group presented on their transition from batch to near-real-time data processing using Apache Spark. OLX operates classifieds platforms across 40+ countries and processes over 4 billion events per day. They moved from processing data once per day to processing data in small chunks within 10 minutes of events occurring. This allows them to send targeted messages to users in near-real-time. Spark is used to continuously load, transform, and store features from raw data files incrementally as new files are received. This enables more creative and flexible user communication through their lifecycle management services.
Driving the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
The document discusses how data scientists need real-time analytics capabilities to power the on-demand economy. It introduces MemSQL 5 as a database platform for real-time analytics that can help overcome barriers like slow loading, queries, and ongoing data processing faced with batch processing. MemSQL 5 includes features like Streamliner for building real-time data pipelines and predictive analytics using Spark and MLlib to power applications like predictive scoring and IoT.
Splunk live! Inteligência operacional em um mundo de bigdataSplunk
This document discusses big data and machine data analytics. It describes how machine data from various sources like servers, security devices, sensors, and mobile devices can provide valuable insights but is challenging to manage and analyze at scale. The document promotes Splunk software's capabilities for ingesting, indexing, and analyzing large volumes of machine data from any source in real-time to provide operational intelligence and turn machine data into business value across use cases like IT operations, security, and analytics. It also advertises Splunk's upcoming annual user conference to showcase new capabilities for machine data analytics.
Deploying Splunk. Arquitetura e dimensionamento do SplunkSplunk
The document discusses architecting and sizing a Splunk deployment. It covers key factors to consider like data volume, search volume, and roles of servers in a distributed Splunk topology. Recommendations are provided around server configurations based on roles like indexer, search head, and forwarder. A reference server specification is also outlined for estimating hardware needs.
SplunkLive! São Paulo 2014 - Overview by markus zirnSplunk
1. The document discusses how Splunk software provides operational intelligence by collecting data from anywhere, allowing users to search and analyze everything, and gain real-time operational insights.
2. It highlights several Splunk customers and how they use Splunk across various industries and use cases such as IT operations, security, application management, and business analytics.
3. The document promotes Splunk's 5th Annual Worldwide User Conference in October 2014 with sessions, speakers, and opportunities to learn about Splunk's platform and ecosystem.
Second presentation in Savi's sponsoring of the Washington DC Spark Interactive. Discusses use of Spark with Drools to create expert systems-based analytics for the Internet of Things (IoT)
Life occurs in real-time, and not surprisingly, more solutions are being built using streaming technologies. Event-based architectures are becoming the norm, and customers are expecting immediate access to their data. This new world offers many exciting opportunities, but also some new challenges. What do you do when your streaming data is not complete? What if it relies on another data source? Does the dependent data exist yet, and does it come from a 3rd party? How do we merge a complete picture of data when data is sourcing from multiple places at the same time? A new norm in the world of distributed services. Join us as we dive deep into the technical details around these scenarios and more. Expect to learn about stream-stream joins, enriching stream data using local or remote data, and ways to anticipate and correct errors within the stream. Leave with a better understanding of managing data dependencies within a Spark Structured Streaming application.
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
The document discusses best practices for integrating machine learning models into production pipelines. It describes the full data science product lifecycle from identifying business needs to deploying models through APIs. Key aspects covered include maintainable code through functions/classes, unit testing and code reviews, using Jenkins and a tool called Apparate to schedule Spark jobs and automatically update libraries, and deploying APIs on Kubernetes through Spinnaker for continuous delivery. Lessons learned emphasize leveraging existing tools and infrastructure while addressing pain points to streamline the end-to-end process.
From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time” with Ema...Databricks
Emanuele Bardelli from OLX Group presented on their transition from batch to near-real-time data processing using Apache Spark. OLX operates classifieds platforms across 40+ countries and processes over 4 billion events per day. They moved from processing data once per day to processing data in small chunks within 10 minutes of events occurring. This allows them to send targeted messages to users in near-real-time. Spark is used to continuously load, transform, and store features from raw data files incrementally as new files are received. This enables more creative and flexible user communication through their lifecycle management services.
Driving the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
The document discusses how data scientists need real-time analytics capabilities to power the on-demand economy. It introduces MemSQL 5 as a database platform for real-time analytics that can help overcome barriers like slow loading, queries, and ongoing data processing faced with batch processing. MemSQL 5 includes features like Streamliner for building real-time data pipelines and predictive analytics using Spark and MLlib to power applications like predictive scoring and IoT.
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersBrett Sheppard
The document discusses Hunk, a self-service analytics platform for exploring, visualizing, and analyzing data stored in Hadoop clusters and other data stores. Hunk allows users to rapidly interact with data through an interactive search interface and preview results without waiting for full queries to finish. It provides integrated visualization of data through built-in graphs and charts. Hunk deployment is fast, requiring under 60 minutes to connect to Hadoop clusters and begin searching data.
Real-Time, Geospatial, Maps by Neil DahlkeSingleStore
This document discusses two real-time geospatial analytics demos using MemSQL - PowerStream and Supercar. PowerStream predicts the health of 197,000 wind turbines globally using 1 million data points per second from sensors. Supercar tracks NYC taxi and limo data in real-time to analyze the on-demand economy. Both demos extract, transform and load streaming data into MemSQL for real-time querying and visualization.
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can Speed up the World"
Bio:
Ronan Corkery is a kdb+ engineer who has been working with Kx and First Derivatives for the past 4 years. Currently based in Total Gas and Power he spent his first 2 year working with Morgan Stanley.
Abstract:
Ronan's presentation will focus on the vertical industries the formally only finance based technologies Kx offers has been moving into. He will present proven solutions as well as introducing the overall architecture that Kx uses as well as laying out potential opportunities to work with Kx.
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLSingleStore
1) The document discusses building real-time data pipelines with Apache Spark and MemSQL to enable real-time analytics.
2) It describes combining the power of Spark for real-time transformations with MemSQL, a real-time database, to make Spark results more accessible.
3) The presentation includes a demo of PowerStream, a MemSQL application that predicts the health of wind turbines using streaming data.
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTMetatron
This document discusses using Flink for real-time data processing and Druid for analytics. It describes problems with processing large volumes of data with late and missing entries. As a case study, it examines real-time driving score processing. It proposes using Flink as a streaming engine to ingest data into Druid for analytics over sliding windows. Key considerations for Flink and integrating it with Druid and Kafka are discussed. The document concludes with questions about implementing this solution using Metatron tools.
1) In-memory computing is growing rapidly, with the total data market expected to grow from $69 billion in 2015 to $132 billion in 2020.
2) In-memory databases are gaining popularity for applications that require fast response times, like telecommunications and mobile advertising, as memory access is faster than disk access.
3) Modern applications are driving adoption of in-memory solutions as they generate more data from more users and transactions and require faster performance to handle growing traffic.
4) Two examples presented were DellEMC using MemSQL for a real-time customer 360 application and an IoT logistics application called MemEx that processes sensor data from warehouses for predictive analytics.
Hadoop can enable zero downtime app deployments by using microservices, continuous delivery, and real-time analytics. The presenters describe how Expedia saves $5M annually through zero downtime deployments. Their architecture uses microservices, continuous integration, deployment monitoring with Storm/Kafka/HDFS, and analytics in Solr/Hive to enable canary testing, fast feedback, and automated problem resolution. A live demo shows log processing, analytics, and using results to ensure smooth, high-quality deployments.
The Fast Path to Building Operational Applications with SparkSingleStore
Nikita Shamgunov gave a presentation about using MemSQL and Spark together. MemSQL is a scalable operational database that can handle petabytes of data with high concurrency. It offers real-time capabilities and compatibility with tools like Spark, Kafka, and ETL/BI tools. The MemSQL Spark Connector allows bidirectional transfer of data between Spark and MemSQL tables for use cases like operationalizing models in Spark, stream/event processing, and live dashboards. Case studies showed customers gaining 10x faster data refresh times and performing entity resolution at scale for fraud detection.
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
Insnap, a hyper-personalized ML-based platform acquired by The Honest Company, has been used to build a real-time data platform based on Apache Spark, Cassandra and Redshift. Users’ behavioral and transactional data have been used to build data models and ML models, and to drive use cases for marketing, growth, finance and operations.
Learn how Honest Company has used Spark as a workhorse for 1) collecting, ETL and storing data from various sources including mysql, mongo, jde, Google analytics, Facebook, Localytics and REST API; 2) building data models and aggregating and generating reports of revenue, order fulfillment tracking, data pipeline monitoring and subscriptions; 3) Using ML to build model for user acquisitions, LTV and recommendations use cases. Spark replaced the monolithic codebase with flexible, scalable and robust pipelines. Databricks helped The Honest Company to focus on data instead of maintaining infrastructure. While Honest users got delightful recommendations to improve experience, data users at Honest understood users much better in terms of segmenting with behavioral information and advanced ML models, leading to increased revenue and retention.
Real-Time Geospatial Intelligence at Scale SingleStore
This document introduces MemSQL 5, a real-time database platform for transactions and analytics. It discusses how MemSQL is designed for modern workloads by providing scalable SQL on in-memory and solid-state storage across distributed data centers or the cloud. MemSQL allows for real-time processing through features like stream processing and real-time dashboards. Examples are given of using MemSQL for Internet of Things applications to monitor wind turbines and taxi ride data.
"Building Real-Time Data Pipelines with Kafka and MemSQL" by Rick Negrin, Director of Product Management at MemSQL for Orange County Roadshow March 17, 2017.
Building scalable software requires designing it so that adding more hardware allows the software to utilize that hardware. Key considerations include avoiding contention over shared resources like CPU, disk, memory and network. Examples of scalable architectures include lock-free skiplist indexes, sharding or partitioning data across multiple machines, distributed query execution, and columnar data stores. Building for scale changes how software features are developed, requiring simple initial designs, leveraging existing resources, ensuring the right technical decisions through code reviews and technical leadership.
Building Reactive Real-time Data PipelineTrieu Nguyen
Topic: Building reactive real-time data pipeline at FPT ?
1) What is “Data Pipeline” ?
2) Big Data Problems at FPT
+ VnExpress: pageview and heat-map
+ eClick: real-time reactive advertising
3) Solutions and Patterns
4) Fast Data Architecture at FPT
5) Wrap up
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore
Nikita Shamgunov presented on the Real-Time Chief Data Officer and the cloud-forward path to predictive analytics. He discussed how MemSQL provides a modern data architecture that enables real-time access to all data, flexible deployments across public/private clouds, and a 360 view of the business without data silos. He showcased several customer use cases that demonstrated transforming analytics from weekly to daily using MemSQL and reducing latency from days to minutes. Finally, he proposed strategies for building a hybrid cloud approach and real-time analytics infrastructure to gain faster historical insights and predictive capabilities.
In this video from the 2014 HPC User Forum in Seattle, Amit Vij and Nima Neghaban from GIS Federal present:GPUdb: A Distributed Database for Many-Core Devices.
Learn more: http://insidehpc.com/video-gallery-hpc-user-forum-2014-seattle/
and
http://gisfederal.com/
Watch the video presentation http://wp.me/p3RLHQ-ddd
Our journey with druid - from initial research to full production scaleItai Yaffe
Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches).
It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases.
In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective.
Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)Albert Wong
Building a data platform doesn’t have to be like entering a portal to Stranger Things.
Join us in one hour for Tableau in the Cloud: A Netflix Original where Albert Wong, Netflix’s analytics expert, will show you how to simplify your data stack to deliver self-service analytics at scale.
Albert will discuss the details of connecting to big data, finding datasets, and discovering critical insights from visualizations. He will also share how Netflix is developing and growing their analytics ecosystem with Tableau, and how they prioritize sustaining their data culture of freedom and responsibility.
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...The Hive
Some of the most demanding real-time big data driven platforms on the Internet today are in programmatic advertising and real-time bidding.
These platforms continuously ingest, store, analyze and act on billions of events and terabytes of data to personalize interactions with every click and swipe across websites, mobile apps, emails, social media, sensors and more. But that’s not enough. In order to win at auction, capture the user’s attention and drive revenue, they must continuously extract new insights with advanced visual analytics and combine these insights with real-time data to perform real-time analytics, moment-by-moment, all the time.
Brian Bulkowski, co-founder & CTO of Aerospike, an open source flash-optimized NoSQL database, will talk about the latest developments in storage and lead a discussion with Kiran about the challenges and opportunities created for analytics at platform scale.
Splunk live! São Paulo 2014 - Edenred-TicketSplunk
O documento descreve como a Edenred, líder mundial em cartões e vouchers de serviços pré-pagos, implementou o Splunk para centralizar logs e melhorar a visibilidade e análise de segurança e desempenho de redes e sistemas. Antes do Splunk, a Edenred enfrentava desafios como demora na análise de incidentes, falta de histórico e métricas em tempo real. Ao implementar o Splunk, a empresa passou a centralizar logs de Active Directory, projetos PCI e firewalls, entre outros, para agilizar respostas e auditorias.
O documento discute como a VTEX usa o Splunk para coletar e analisar logs, métricas e dados de máquinas para monitoramento e fornecer insights de negócios aos clientes. Antes do Splunk, a VTEX enfrentava desafios para centralizar e analisar grandes volumes de dados gerados. O Splunk permitiu a criação de um ambiente centralizado para logs e o desenvolvimento de aplicativos para análises específicas.
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersBrett Sheppard
The document discusses Hunk, a self-service analytics platform for exploring, visualizing, and analyzing data stored in Hadoop clusters and other data stores. Hunk allows users to rapidly interact with data through an interactive search interface and preview results without waiting for full queries to finish. It provides integrated visualization of data through built-in graphs and charts. Hunk deployment is fast, requiring under 60 minutes to connect to Hadoop clusters and begin searching data.
Real-Time, Geospatial, Maps by Neil DahlkeSingleStore
This document discusses two real-time geospatial analytics demos using MemSQL - PowerStream and Supercar. PowerStream predicts the health of 197,000 wind turbines globally using 1 million data points per second from sensors. Supercar tracks NYC taxi and limo data in real-time to analyze the on-demand economy. Both demos extract, transform and load streaming data into MemSQL for real-time querying and visualization.
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can Speed up the World"
Bio:
Ronan Corkery is a kdb+ engineer who has been working with Kx and First Derivatives for the past 4 years. Currently based in Total Gas and Power he spent his first 2 year working with Morgan Stanley.
Abstract:
Ronan's presentation will focus on the vertical industries the formally only finance based technologies Kx offers has been moving into. He will present proven solutions as well as introducing the overall architecture that Kx uses as well as laying out potential opportunities to work with Kx.
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQLSingleStore
1) The document discusses building real-time data pipelines with Apache Spark and MemSQL to enable real-time analytics.
2) It describes combining the power of Spark for real-time transformations with MemSQL, a real-time database, to make Spark results more accessible.
3) The presentation includes a demo of PowerStream, a MemSQL application that predicts the health of wind turbines using streaming data.
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
In the on-demand economy real-time analytics is both a necessity and a competitive advantage. The next evolution in the on-demand economy is in predictive analytics fueled by live streams of data—in effect knowing what customers want before they do. This session will feature technical examples of real-time pipelines, machine learning, and custom dashboards as well as off-the-shelf dashboards with Tableau.
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTMetatron
This document discusses using Flink for real-time data processing and Druid for analytics. It describes problems with processing large volumes of data with late and missing entries. As a case study, it examines real-time driving score processing. It proposes using Flink as a streaming engine to ingest data into Druid for analytics over sliding windows. Key considerations for Flink and integrating it with Druid and Kafka are discussed. The document concludes with questions about implementing this solution using Metatron tools.
1) In-memory computing is growing rapidly, with the total data market expected to grow from $69 billion in 2015 to $132 billion in 2020.
2) In-memory databases are gaining popularity for applications that require fast response times, like telecommunications and mobile advertising, as memory access is faster than disk access.
3) Modern applications are driving adoption of in-memory solutions as they generate more data from more users and transactions and require faster performance to handle growing traffic.
4) Two examples presented were DellEMC using MemSQL for a real-time customer 360 application and an IoT logistics application called MemEx that processes sensor data from warehouses for predictive analytics.
Hadoop can enable zero downtime app deployments by using microservices, continuous delivery, and real-time analytics. The presenters describe how Expedia saves $5M annually through zero downtime deployments. Their architecture uses microservices, continuous integration, deployment monitoring with Storm/Kafka/HDFS, and analytics in Solr/Hive to enable canary testing, fast feedback, and automated problem resolution. A live demo shows log processing, analytics, and using results to ensure smooth, high-quality deployments.
The Fast Path to Building Operational Applications with SparkSingleStore
Nikita Shamgunov gave a presentation about using MemSQL and Spark together. MemSQL is a scalable operational database that can handle petabytes of data with high concurrency. It offers real-time capabilities and compatibility with tools like Spark, Kafka, and ETL/BI tools. The MemSQL Spark Connector allows bidirectional transfer of data between Spark and MemSQL tables for use cases like operationalizing models in Spark, stream/event processing, and live dashboards. Case studies showed customers gaining 10x faster data refresh times and performing entity resolution at scale for fraud detection.
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
Insnap, a hyper-personalized ML-based platform acquired by The Honest Company, has been used to build a real-time data platform based on Apache Spark, Cassandra and Redshift. Users’ behavioral and transactional data have been used to build data models and ML models, and to drive use cases for marketing, growth, finance and operations.
Learn how Honest Company has used Spark as a workhorse for 1) collecting, ETL and storing data from various sources including mysql, mongo, jde, Google analytics, Facebook, Localytics and REST API; 2) building data models and aggregating and generating reports of revenue, order fulfillment tracking, data pipeline monitoring and subscriptions; 3) Using ML to build model for user acquisitions, LTV and recommendations use cases. Spark replaced the monolithic codebase with flexible, scalable and robust pipelines. Databricks helped The Honest Company to focus on data instead of maintaining infrastructure. While Honest users got delightful recommendations to improve experience, data users at Honest understood users much better in terms of segmenting with behavioral information and advanced ML models, leading to increased revenue and retention.
Real-Time Geospatial Intelligence at Scale SingleStore
This document introduces MemSQL 5, a real-time database platform for transactions and analytics. It discusses how MemSQL is designed for modern workloads by providing scalable SQL on in-memory and solid-state storage across distributed data centers or the cloud. MemSQL allows for real-time processing through features like stream processing and real-time dashboards. Examples are given of using MemSQL for Internet of Things applications to monitor wind turbines and taxi ride data.
"Building Real-Time Data Pipelines with Kafka and MemSQL" by Rick Negrin, Director of Product Management at MemSQL for Orange County Roadshow March 17, 2017.
Building scalable software requires designing it so that adding more hardware allows the software to utilize that hardware. Key considerations include avoiding contention over shared resources like CPU, disk, memory and network. Examples of scalable architectures include lock-free skiplist indexes, sharding or partitioning data across multiple machines, distributed query execution, and columnar data stores. Building for scale changes how software features are developed, requiring simple initial designs, leveraging existing resources, ensuring the right technical decisions through code reviews and technical leadership.
Building Reactive Real-time Data PipelineTrieu Nguyen
Topic: Building reactive real-time data pipeline at FPT ?
1) What is “Data Pipeline” ?
2) Big Data Problems at FPT
+ VnExpress: pageview and heat-map
+ eClick: real-time reactive advertising
3) Solutions and Patterns
4) Fast Data Architecture at FPT
5) Wrap up
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore
Nikita Shamgunov presented on the Real-Time Chief Data Officer and the cloud-forward path to predictive analytics. He discussed how MemSQL provides a modern data architecture that enables real-time access to all data, flexible deployments across public/private clouds, and a 360 view of the business without data silos. He showcased several customer use cases that demonstrated transforming analytics from weekly to daily using MemSQL and reducing latency from days to minutes. Finally, he proposed strategies for building a hybrid cloud approach and real-time analytics infrastructure to gain faster historical insights and predictive capabilities.
In this video from the 2014 HPC User Forum in Seattle, Amit Vij and Nima Neghaban from GIS Federal present:GPUdb: A Distributed Database for Many-Core Devices.
Learn more: http://insidehpc.com/video-gallery-hpc-user-forum-2014-seattle/
and
http://gisfederal.com/
Watch the video presentation http://wp.me/p3RLHQ-ddd
Our journey with druid - from initial research to full production scaleItai Yaffe
Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches).
It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases.
In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective.
Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)Albert Wong
Building a data platform doesn’t have to be like entering a portal to Stranger Things.
Join us in one hour for Tableau in the Cloud: A Netflix Original where Albert Wong, Netflix’s analytics expert, will show you how to simplify your data stack to deliver self-service analytics at scale.
Albert will discuss the details of connecting to big data, finding datasets, and discovering critical insights from visualizations. He will also share how Netflix is developing and growing their analytics ecosystem with Tableau, and how they prioritize sustaining their data culture of freedom and responsibility.
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...The Hive
Some of the most demanding real-time big data driven platforms on the Internet today are in programmatic advertising and real-time bidding.
These platforms continuously ingest, store, analyze and act on billions of events and terabytes of data to personalize interactions with every click and swipe across websites, mobile apps, emails, social media, sensors and more. But that’s not enough. In order to win at auction, capture the user’s attention and drive revenue, they must continuously extract new insights with advanced visual analytics and combine these insights with real-time data to perform real-time analytics, moment-by-moment, all the time.
Brian Bulkowski, co-founder & CTO of Aerospike, an open source flash-optimized NoSQL database, will talk about the latest developments in storage and lead a discussion with Kiran about the challenges and opportunities created for analytics at platform scale.
Splunk live! São Paulo 2014 - Edenred-TicketSplunk
O documento descreve como a Edenred, líder mundial em cartões e vouchers de serviços pré-pagos, implementou o Splunk para centralizar logs e melhorar a visibilidade e análise de segurança e desempenho de redes e sistemas. Antes do Splunk, a Edenred enfrentava desafios como demora na análise de incidentes, falta de histórico e métricas em tempo real. Ao implementar o Splunk, a empresa passou a centralizar logs de Active Directory, projetos PCI e firewalls, entre outros, para agilizar respostas e auditorias.
O documento discute como a VTEX usa o Splunk para coletar e analisar logs, métricas e dados de máquinas para monitoramento e fornecer insights de negócios aos clientes. Antes do Splunk, a VTEX enfrentava desafios para centralizar e analisar grandes volumes de dados gerados. O Splunk permitiu a criação de um ambiente centralizado para logs e o desenvolvimento de aplicativos para análises específicas.
O documento discute a implementação do Splunk na Produban para melhorar a detecção e resposta a incidentes de segurança. Inicialmente, o SIEM anterior não atendia mais as necessidades em termos de volume de dados, disponibilidade e customização. Após testes, o Splunk mostrou-se muito mais rápido, requerendo menos hardware. O Splunk permite a automação de respostas a ameaças, integrando diversas fontes de inteligência e aplicando ações diretamente nos dispositivos de segurança. Isso agilizou a resposta a incidentes e evitou novos
Segue um material interessante do que a Vodafone está fazendo com o Splunk.
Esse em especial foi apresentado no .conf2013, convenção mundial da Splunk e teremos o .conf2014 em Outubro desse ano - programem-se e participem, vale cada centavo!
Lembrando, o .conf2014 já está com as inscrições abertas e em preço promocional.
Mais informações, aqui: http://conf.splunk.com/?r=homepage
SplunkLive! Hamburg / München Advanced SessionGeorg Knon
This document provides an agenda and overview for an advanced Splunk training workshop. The agenda includes discussions of building apps, users and roles, and an example Splunk app. It assumes participants have advanced Splunk skills and experience building searches, reports, dashboards and sourcetypes. It aims to teach participants how to create custom Splunk apps with navigable views, integrate authentication, and customize the user interface.
The Climate Corporation started as an insurance company in 2006 and later changed its focus to agriculture data analytics. The author joined The Climate Corporation in 2015 and helped build out their security infrastructure using Splunk, onboarding various data sources and ensuring data integrity. Their use of Splunk includes application visibility, alerting capabilities, and plans to expand data sources and search capabilities over time.
O documento descreve como o Universo Online usa o Splunk para monitorar transações de e-commerce, tomar decisões de negócios em tempo real e medir o retorno de investimento em mídia online. O Splunk fornece dashboards centralizados que melhoraram a visibilidade entre as equipes de monitoramento, P&D e negócios.
Este documento fornece um resumo de como vários clientes no Brasil estão usando a Splunk para obter visibilidade operacional em tempo real e insights de negócios. Alguns clientes mencionados incluem PagSeguro, BM&F Bovespa, e Experian.
A BM&FBOVESPA centraliza logs de suas plataformas de negociação no Splunk para monitoramento em tempo real e geração de relatórios. O Splunk permite filtrar e otimizar dados para aplicações que monitoram servidores, mensagens e o próprio ambiente Splunk. Essas aplicações foram desenvolvidas pela CME, Splunk e Silverlink Technologies para atender aos desafios de monitoramento da BM&FBOVESPA.
Este documento descreve o uso da Splunk na empresa VTEX para gerenciar logs e métricas de mais de 1000 clientes. A VTEX começou usando a Splunk para armazenar 2GB de dados, e agora armazena 65GB para fornecer insights que melhoram a tomada de decisão. A Splunk permite monitorar o desempenho, identificar anormalidades e aumentar a conversão.
O documento discute como a 99Taxis usa o Splunk para agregar logs de sistemas, permitindo buscas entre sistemas, monitoramento em tempo real de métricas chave e análises que melhoram a agilidade e tomada de decisões. Isso superou desafios de visibilidade e troubleshooting em um ambiente complexo com dezenas de sistemas e 100GB de logs diários.
Este documento é a agenda de um evento da Splunk que apresentará casos de sucesso de clientes como Produban, Vtex, PagSeguro e Edenred. A agenda inclui boas-vindas, uma visão geral da Splunk, quatro apresentações de casos de sucesso de clientes, coffee breaks e um happy hour no final.
Building an Analytics - Enabled SOC Breakout Session Splunk
This document provides an overview of building an analytics-enabled security operations center (SOC). It discusses the three main components of a SOC - process, people, and technology. For process, it covers threat modeling, playbooks, tier structures, shift rotations, and other operational aspects. For people, it describes the different roles required in a SOC. For technology, it promotes Splunk Enterprise as a security intelligence platform that can power all functions of a SOC. It also provides examples of how Splunk can be used for various SOC use cases and processes.
This document provides an overview and agenda for the Splunk App for Stream, including:
- The architecture of the Stream Forwarder for capturing wire data and routing it to Splunk.
- The architecture of the App for Stream for analyzing wire data in Splunk.
- Examples of deployment architectures for ingesting wire data.
- A customer use case where wire data from the network helped provide visibility that log data could not due to access restrictions.
Splunk as a_big_data_platform_for_developers_spring_one2gxDamien Dallimore
Splunk is a platform for collecting, analyzing, and visualizing machine data. It provides real-time search and reporting across IT systems and infrastructure. Splunk indexes data from various sources without needing predefined schemas, and scales to handle large volumes of data from thousands of systems. The presentation covers an overview of the Splunk platform and how it can be used by developers, including custom visualizations, the Java SDK, and integrations with Spring applications.
The document summarizes the history of Bristol Bay Red King Crab and Bering Sea Snow Crab fisheries. It describes how the Red King Crab fishery began in the 1930s with Japanese and Russian fleets and expanded with the US trawl fleet in the late 1940s-1980s. It then discusses management changes over time, including the implementation of crab rationalization in 2005. For snow crab, it notes the fishery began as incidental catch in 1977 and increased until peaking in 1991, then decreasing dramatically by 1996 before stabilizing in the late 1990s.
'Best Practices' & 'Context-Driven' - Building a bridge (2003)Neil Thompson
The document outlines a presentation on bridging the gap between "best practices" and "context-driven" approaches to test management. It discusses learning objectives around understanding each perspective and applying Goldratt's theory of constraints thinking tools. The presentation will cover examples of using these tools to analyze testing practices and determine what is most appropriate based on different project contexts.
The document provides an overview of the Illinois Army National Guard's 33rd Infantry Brigade Combat Team (IBCT) preparing for and participating in the eXportable Combat Training Capability (XCTC) annual training exercise at Camp Ripley, Minnesota. It discusses the IBCT setting up a Tactical Operations Center and conveying over 2,000 soldiers, vehicles, and equipment from across Illinois to the training site. It also previews distinguished visitors attending and emphasizes that the training will validate the brigade's readiness and identify areas for improvement.
- TELUS is Canada's second largest mobile carrier with 8.4 million subscribers and focuses on customer experience.
- Philippe Tang uses Splunk as a network specialist to monitor TELUS' wireless network and customer experience.
- Previously, Tang used multiple tools to extract and correlate network performance data which was time-consuming, but Splunk allows him to aggregate this data in one place.
- At TELUS, Splunk collects over 80GB of data per day from network devices, social media, and systems. It is used for monitoring, alerts, dashboards and reporting on key performance indicators.
Getting Started with Splunk Breakout SessionSplunk
This document provides an overview and agenda for a presentation on getting started with Splunk Enterprise. The presentation covers an overview of Splunk Inc. and the Splunk platform, a live demonstration of using Splunk to install, index, search, create reports and dashboards, and set alerts. It also discusses deploying Splunk in distributed architectures, the Splunk community resources, and support options. The goal is to help attendees understand how to use the key capabilities of Splunk Enterprise.
Getting Started with Splunk Breakout SessionSplunk
This document provides an overview and introduction to Splunk Enterprise. It begins with an agenda that outlines discussing Splunk Enterprise, a live demonstration of using Splunk, deployment architecture, the Splunk community, and a Q&A. It then discusses how Splunk can unlock insights from machine data generated from various sources. The live demo shows installing Splunk, forwarding sample data, and performing searches. It also discusses deploying Splunk at scale, distributed architectures, and support resources available through the Splunk community.
Danfoss - Splunk for Vulnerability ManagementSplunk
This document summarizes a presentation about Danfoss' use of Splunk for vulnerability management. It provides an overview of Danfoss, the background and experience of the presenter, how Danfoss got started with Splunk in 2008 to meet log collection and retention requirements, and how their use of Splunk has evolved over time to include dashboards, security, automated alerting, and a Sophos antivirus case study. It outlines next steps of expanding Splunk's use to more teams and exploring advanced analytics.
Getting Started with Splunk Enterprise Hands-On Breakout SessionSplunk
This document provides an overview and demonstration of Splunk Enterprise. It discusses what machine data is and Splunk's mission to make it accessible. The presentation covers installing and onboarding data into Splunk, performing searches, creating dashboards and alerts. It also summarizes deployment architectures for Splunk and options for support and learning more.
Getting Started with Splunk Breakout SessionSplunk
Splunk is a software platform that allows users to search, monitor, and analyze machine-generated big data via a web-style interface. It collects and indexes data from various sources like servers, networks, sensors, and applications. Splunk is used by over 9,000 customers across various industries for security intelligence, IT operations, and business analytics. It offers role-based access controls, multi-tenancy, and secure data transmission. Splunk components include universal forwarders for data collection, indexers for data storage, and search heads for data presentation.
Getting Started with Splunk Breakout SessionSplunk
Splunk is a software company that provides software for searching, monitoring, and analyzing machine-generated big data via a web-style interface. The document discusses why organizations use Splunk, provides an overview of the company and its products, describes how Splunk works and how to get started with it. It also advertises Splunk's upcoming user conference to provide training, certification, and opportunities to learn from customers and partners about using Splunk.
The document discusses how the Earth System Grid Federation (ESGF) leverages tools from Apache Solr and Apache Object Oriented Data Technology (OODT) to manage and distribute large amounts of climate science data. ESGF is an international collaboration that uses a distributed network of nodes running various software components to provide access to over 2.5 petabytes of climate model output and observational data. This infrastructure supports the research of the Intergovernmental Panel on Climate Change and projects like CMIP5, the largest coordinated climate modeling effort to date.
SplunkLive! Washington DC May 2013 - Splunk Enterprise 5Splunk
This document provides an overview of Splunk Enterprise 5 software. The key points are:
1. Splunk Enterprise 5 provides faster reports that are up to 1000x faster through new report acceleration technology, easier to create dynamic drill-downs, and integrated PDF sharing capabilities.
2. It offers enterprise-scale resilience and high availability through features like index replication that allows indexed data to remain searchable even if an indexer fails.
3. The software includes enhanced modularity, interoperability and extensibility through tools like modular inputs that simplify adding new data sources, and APIs/SDKs that allow developers to integrate Splunk with other technologies.
Splunk in the Cisco Unified Computing System (UCS) Splunk
Cisco has been a Splunk customer for 8 years, with a strong engineering partnership for 3+ years. Learn how several Cisco customers as well as Cisco IT have deployed, grown, and transformed our businesses using the advantages of Splunk Enterprise software together with Cisco UCS and Nexus hardware. We will also talk about scalability and performance considerations for all scales of data footprint and business growth.
Getting Started with Splunk Enterprise Hands-OnSplunk
This document provides an overview and demonstration of Splunk software. The agenda includes downloading Splunk, an overview of its key features for searching machine data, field extraction, dashboards, alerting, and analytics. The presenter then demonstrates installing and onboarding sample data, performing searches, and using pivots. deployment architectures are discussed along with scaling to hundreds of terabytes per day. Questions areas like documentation, support, and the Splunk user conference are also mentioned.
LucidWorks App for Splunk Enterprise is the first of its kind, specifically designed to allow companies to analyze and manage the health and availability of their Solr deployments in Splunk software. The solution integrates multi-structured data indexed by Solr directly into Splunk® Enterprise, giving system administrators the ability to look at the intersection of documents, customer records or other unstructured data sources as they relate to machine data. This enables companies to optimize their Solr applications, glean insights from search and usage patterns and spot security concerns to improve end user experiences and derive more business value from data-driven applications.
This webinar will explore the features of the App, and provide attendees with valuable information on the following key components:
Solr Monitor: Monitor the health and availability and utilization of LucidWorks and/or Solr deployments with pre-defined data inputs, dashboards and reports
Search Analytics: Perform user behavior and click-stream analysis with pre-built search analytics reports and fields
NoSQL Lookups: Using Splunk’s lookup facility enrich your Splunk reports with data of any structure using Solr’s fully indexed and searchable NoSQL-datastore
Search Time Joins: Join Splunk data with human generated and other unstructured data sources stored in Solr at search time for developing data-driven applications
1) Cisco has been using Splunk enterprise for over 7 years across many business units and teams, with daily indexing growing from 300GB in 2010 to over 2TB currently.
2) Cisco's Computer Security Incident Response Team (CSIRT) uses Splunk as their security information and event management (SIEM) platform to monitor 350TB of stored data across 60 global users.
3) The presentation discusses how Cisco and some of its customers have successfully deployed Splunk on Cisco Unified Computing System (UCS) servers to scale their Splunk environments and gain benefits of simplified and repeatable deployments.
Splunk Dashboarding & Universal Vs. Heavy ForwardersHarry McLaren
This document provides an agenda and summaries for a Splunk user group meeting in Edinburgh. The meeting will include presentations and discussions on creating dashboards, using universal vs. heavy forwarders, and latest Splunk challenges and solutions. It introduces the speakers, including employees from the hosting company ECS and user group leader Harry McLaren. Updates from the recent Splunk .conf event are also summarized, such as new premium app releases and the Splunk ML Toolkit.
Splunk, Software Tools, Big Data, Logging, PCI, Information security, Cisco Systems, VMware ESX, Regulatory compliance, FISMA, Enterprise architecture, Data center, security software, SCADA, Windows,Unix,Scanners, Citrix, Microsoft Active Directory
Splunk is a software platform that allows users to search, monitor, and analyze machine-generated data in real-time. It is used by over 10,000 customers across many industries to gain operational intelligence. Splunk indexes data from various sources like servers, networks, applications, and devices and allows users to interact with the data through searching, reporting, visualization, and alerting. It provides universal access to data regardless of format or source, and scales from small environments to very large ones processing hundreds of terabytes per day.
The document discusses Grid computing and the Globus Toolkit. It provides an overview of Grid computing, describing it as the sharing of computer resources and coordinated problem solving across multiple institutions. It then summarizes the Globus Toolkit, describing it as open source software that provides basic components for Grid functionality, including security, execution management, data management, and monitoring. The Globus Toolkit aims to make it easier to build collaborative distributed applications that can exploit shared Grid infrastructure.
If mobile apps are part of your business, having real-time insight on app performance, crashes, usage and transactions is critical. Data derived directly from mobile app usage—called “mobile data”—can help you deliver better performing apps and increase application visibility. With the massive increase in smartphone and mobile app usage, your app’s performance is more important than ever. Learn how to gain Operational Intelligence from your mobile apps with Splunk MINT.
If mobile apps are part of your business, having real-time insight on app performance, crashes, usage and transactions is critical. Data derived directly from mobile app usage—called “mobile data”—can help you deliver better performing apps and increase application visibility. With the massive increase in smartphone and mobile app usage, your app’s performance is more important than ever. Learn how to gain Operational Intelligence from your mobile apps with Splunk MINT.
If mobile apps are part of your business, having real-time insight on app performance, crashes, usage and transactions is critical. Data derived directly from mobile app usage—called “mobile data”—can help you deliver better performing apps and increase application visibility. With the massive increase in smartphone and mobile app usage, your app’s performance is more important than ever. Learn how to gain Operational Intelligence from your mobile apps with Splunk MINT.
Similar to Exxon - SplunkLive! São Paulo 2015 (20)
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
2. 2
Agenda
About ExxonMobil and Geoffrey Martins
Why Shared Service?
The Four Major Challenges
Final Unified Network
Potential Next Steps
Takeouts
Q&A
3. 3
Largest International Oil & Gas Company in the World
75.000 employees worldwide
Presence in 100+ countries
2014 Numbers
– Gross Income: 411 Billion Dollars
– Net Income: 32 Billion Dollars
Worldwide support center in Brazil – Curitiba-PR
– 1200 employees
– 800 in IT only!
4. 4
Geoffrey Martins
Splunk Architect in Analytics E&D
– Live in Curitiba, Brazil;
– 8 years with ExxonMobil;
.Net Developer
SAP BW Consultant
– Masters Degree in Computing Sciences
– PhD student at UFPR
5. 5
Why Shared Service?
• Scenario by end-2013
• Splunk first brought to the company in
2012
• Several independent Splunk networks for
different departments
• Compartmentalized information
• Duplicated data ingestions
• Divergent reports coming from different
instances
• Separate support teams and separate
development teams.
• No standardization between instances.
• No Dev/Sandbox environment.
6. 6
Why Shared Service?
• Challenge: Single Worldwide Splunk
Network
• Aim for a single Splunk network
• Explore Splunk’s main advantage: Data
sharing and collaboration
• Optimize data acquisition, no duplicates.
• Standardize development and developers,
all working in a single direction.
• Make developers aware of each other
• Share code, share ideas.
• Unify user base
• Unify support
7. 7
The Four Major Challenges:
> Unify Infrastructure
> Single User Base
> Solid Support Team
> The Massive Data Unification
8. 8
Unify Infrastructure
Gather all licenses in a single licensing server
Expand presence to all continents
– Concentrate and transform data closer to the origin.
– Indexers in Asia and Europe
– Forwarders in Asia, Europe, Africa and South America.
Add power to Search Heads
– Move from totally separate search heads to two main Search Head Clusters:
General Purpose
CyberSecurity-Exclusive
Create a real region-based structure
– Store data closer to origin.
– Smaller transfers between sites.
9. 9
Unify User Base
Identify existing power users and form new ones
– Create a real community of Splunk power users
– Establish rules to form power users.
Attend to three official Splunk courses
Establish a ownership process for data and apps
– Each index must have a data owner
– Each app must have an owner and a responsible power user.
Establish periodic power user meetings
– Power Users know what each other is doing
– Opportunity to showcase apps, questions help.
– Exchange of ideas, use cases, etc…
10. 10
A Solid Team Supportability Team
Centralized in one single IT team
Mix of In-House Apps and Splunk-provided solutions
In-house developed app for real-time health monitoring (Uber Admin)
Splunk and 3rd party apps for network and Universal Forwarder management.
– Distributed Management Console and SOS
– TA-ForwarderQuery
– FireBrigade, Deployment Monitor, UtilizationMonitor…
Train a support team and integrate into the community
Facilitate access to support and Splunk administrators
15. 15
A Solid Development Environment
Creation of a Development Network
– 1 Search Head, 2 Indexers, 2 heavy forwarders.
– Exclusive to Power Users and Admins
– Change management process:
All development on dev network.
Once app reach production quality, Admins move it to the production network.
Exclusive allocation reserved to the Dev network.
Sandbox Environment
– Single all-in-one server
– No-man’s land, everyone can do anything
– Area open for experiments/prototypying
– Useful to state if Splunk is the right solution for the data.
16. 16
The Massive Data Unification
Bring all indexers together in a single indexer layer
– Document content of all indexes and make them visible
– Make users aware of all data available to them
Each department can benefit from data coming from other departments.
The main cause for load duplication is UNAWARENESS of data.
– Only segregate data when necessary. Keep data Free!
Company has strict rules for management and protection of information.
Candidates for segregation: Private and/or Proprietary data.
Leverage Distributed Capabilities of Splunk!
– Position your Indexers/Search Head strategically
– Know your data!
– Splunk runs on commodity hardware. Put it to use!
17. 17
The Final Unified Network
4-node General Purpose SHC
1 Segregated Search Head
3 Deployment Servers
1 Licensing Server
30 Indexers:
Most in US, Some in Europe and
Asia
22 Heavy Forwarders
All major sites, including Africa and
South America
~6000 Universal Forwarders
October: All 15.000 Servers
18. 18
Potential Next Steps
Splunk Mobile App
– Bring Splunk Accessibility to ALL Company Devices
Splunk MINT
– Mobile Intelligence for In-House iOS Apps
Hunk
– Proof of Concept for Hadoop
19. 19
Take-outs on a Successful Shared Service
Leverage your power users, make them known
– Awareness of each other is the key
– Your power users are your greatest resource
Unify your network, make your data visible
– Invest in documentation, know your data!
– Bring all your data together, avoid segregation unless necessary
– A development environment gives freedom and protects your Splunk network.
Keep a close eye in your network
– Monitoring can let you find problems before they happen!
– Splunk has superb monitoring capabilities: USE THEM!
– Resiliency is cheap and essential. Be prepared.
– Take retention periods very seriously!