Lambda architecture for real time big dataTrieu Nguyen
Lambda Architecture in Real-time Big Data Project
Concepts & Techniques “Thinking with Lambda”
Case study in some real projects
Why lambda architecture is correct solution for big data?
Lambda architecture for real time big dataTrieu Nguyen
Lambda Architecture in Real-time Big Data Project
Concepts & Techniques “Thinking with Lambda”
Case study in some real projects
Why lambda architecture is correct solution for big data?
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
This session will detail best practices for architecting, building, operating and managing an Analytics Data Lake platform. Key topics will include:
1) Defining next-generation Data Lake architectures. The defacto standard has been commodity DAS servers with HDFS, but there are now multiple solutions aimed at separating compute and storage, virtualizing or containerizing Hadoop applications, and utilizing Hadoop compatible or embedded HDFS filesystems. This portion will explore the options available, and the pros and cons of each.
2) Data Ingest. There are many ways to load data into a Data Lake, including standardized Apache tools (Sqoop, Flume, Kafka, Storm, Spark, NiFi), standard file and object protocols (SFTP, NFS, Rest, WebHDFS), and proprietary tools (eg, Zaloni Bedrock, DataTorrent). This section will explore these options in the context of best fit to workflows; it will also look at key gaps and challenges, particularly in the areas of data formats and integration with metadata/cataloging tools.
3) Metadata & Cataloguing. One of the biggest inhibitors of successful Data Lake deployments is Data Governance, particularly in the areas of indexing, cataloguing and metadata management. It is nearly impossible to run analytics on top of a Data Lake and get meaningful & timely results without solving these problems. This portion will explore both emerging open standards (Apache Atlas, HCatalog) and proprietary tools (Cloudera Navigator, Zaloni Bedrock/Mica, Informatica Metadata Manager), and balance the pros, cons and gaps of each.
4) Security & Access Controls. Solving these challenges are key for adoption in regulatory driven industries like Healthcare & Financial Services. There are multiple Apache projects and proprietary tools to address this, but the challenge is making security and access controls consistent across the entire application and infrastructure stack, and over the data lifecycle, and being able to audit this in the face of legal challenges. This portion will explore available options and best practices.
5) Provisioning & Workflow Management. The real promise of the Data Lake is integrating Analytics workflows and tools on converged infrastructure-with shared data-and build “As A Service” oriented architectures that are oriented towards self-service data exploration and Analytics for end users. This is an emerging and immature area, but this session will explore some potential concepts, tools and options to achieve this.
This will be a moderately technical session, with the above topics being illustrated by real world examples. Attendees should have basic familiarity with Hadoop and the associated Apache projects.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
When you look at traditional ERP or management systems, they are usually used to manage the supply chain originating from either the point of Origin or point of destination which all our primarily physical locations. And for these, you have several processes like order to cash, source to pay, physical distribution, production etc.
In this session we take an in-depth look into the Apache Atlas open metadata and governance function.
Open metadata and governance is a moon-shot type of project to create a set of open APIs, types, and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery, and access frameworks to automate the collection, management, and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed, and used in order to deliver maximum value to the enterprise.
Apache Atlas is the reference implementation of the Open Metadata and Governance standards and framework (https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance). This function will enable an Apache Atlas server to synchronize and query metadata from any open metadata-compliant metadata repository.
In this session we will cover how Open Metadata and Governance works. This includes: (1) the key components in Atlas, (2) the different integration patterns and APIs that vendors can use to integrate their technology into the open metadata ecosystem, and (3) how common metadata use cases such as searching for data sets, managing security (through Atlas/Ranger integration), and automated metadata discovery work in the active ecosystem.
Speaker
Mandy Chessell, Distinguished Engineer, IBM
Securing and governing a multi-tenant data lake within the financial industryDataWorks Summit
Standard Bank South Africa is a Hortonworks client, with several multi-node clusters hosting Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF). This presentation will discuss the technical detail of implementing security, governance and multi-tenancy on a "Data Lake" within the finance industry. The talk will address the team's experiences, challenges, failures and learnings that we took away from this behemoth of an adventure.
After introducing Standard Bank and the Hadoop admin team, the presentation will describe the security and governance journey Standard Bank has undergone since the project's inception in 2015, as well as the roadmap for the future ahead.
Presentation structure:
1. Team introduction with background information
2. Environment overview (Where we are - Current)
-----Security
---------Authentication through Kerberos and LDAP/ AD
---------Authorization through Ranger and Centrify
---------Transparent Data Encryption (TDE) at rest
-----Governance
---------Centralized auditing
---------Ranger policies and data steward ownership
-----Multi-Tenancy
---------Data lake Vs. data analytics platform
---------Edge nodes Vs. API framework through Knox
3. How did we get to this stage? (Past)
-----Challenges faced (Kerberos, AD integration, SSL)
-----How we overcame these challenges
4. Future challenges we foresee (Future)
-----How we are planning to prepare for them
Speakers
Ian Pillay, Hadoop Administrator, Standard Bank
Brad Smith, Hadoop Administrator, Standard Bank
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
For firms in the financial industry, especially within regulated organizations such as credit card processors and banks, PCI DSS compliance has become a business and operational necessity. Although the blueprint of a PCI-compliant architecture varies from organization to organization, the mixture of modern Hadoop-based data lakes and legacy systems are a common theme.
In this talk, we will discuss recent updates to PCI DSS and how significant portions of PCI DSS compliance controls can be achieved using open source Hadoop security stack and technologies for the Hadoop ecosystem. We will provide a broad overview of implementing key aspects of PCI DSS standards at WorldPay such as encryption management, data protection with anonymization, separation of duties, and deployment considerations regarding securing the Hadoop clusters at the network layer from a practitioner’s perspective. The talk will provide patterns and practices map current Hadoop security capabilities to security controls that a PCI-compliant environment requires.
Speaker
David Walker, Enterprise Data Platform Programme Director, Worldpay
Srikanth Venkat, Senior Director Product Management, Hortonworks
Optimizing industrial operations using the big data ecosystemDataWorks Summit
GE Digital is undertaking a journey to optimize the reliability, availability, and efficiency of assets in the industrial sector and converge IT and OT. To do so, GE Digital is building cloud-based products that enable customers to analyze the asset data, detect anomalies, and provide recommendations for operating plants efficiently while increasing productivity. In a energy sector such as oil and gas, power, or renewables, a single plant comprises multiple complex assets, such as steam turbines, gas turbines, and compressors, to generate power. Each system contains various sensors to detect the operating conditions of the assets, generating large volumes of variety of data. A highly scalable distributed environment is required to analyze such a large volume of data and provide operating insights in near real time.
In this session I will share the challenges encountered when analyzing the large volumes of data, in-stream data analysis and how we standardized the industrial data based on data frames, and performance tuning.
Big data ingest frameworks ship with an array of connectors for common data origins and destinations, such as flat files, S3, HDFS, Kafka etc, but sometimes, you need to send data to, or receive data from a system that's not on the list. StreamSets includes template code for building your own connectors and processors; we'll walk through the process of building a simple destination that sends data to a REST web service, and show how it can be extended to target more sophisticated systems such as Salesforce Wave Analytics.
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
This session will detail best practices for architecting, building, operating and managing an Analytics Data Lake platform. Key topics will include:
1) Defining next-generation Data Lake architectures. The defacto standard has been commodity DAS servers with HDFS, but there are now multiple solutions aimed at separating compute and storage, virtualizing or containerizing Hadoop applications, and utilizing Hadoop compatible or embedded HDFS filesystems. This portion will explore the options available, and the pros and cons of each.
2) Data Ingest. There are many ways to load data into a Data Lake, including standardized Apache tools (Sqoop, Flume, Kafka, Storm, Spark, NiFi), standard file and object protocols (SFTP, NFS, Rest, WebHDFS), and proprietary tools (eg, Zaloni Bedrock, DataTorrent). This section will explore these options in the context of best fit to workflows; it will also look at key gaps and challenges, particularly in the areas of data formats and integration with metadata/cataloging tools.
3) Metadata & Cataloguing. One of the biggest inhibitors of successful Data Lake deployments is Data Governance, particularly in the areas of indexing, cataloguing and metadata management. It is nearly impossible to run analytics on top of a Data Lake and get meaningful & timely results without solving these problems. This portion will explore both emerging open standards (Apache Atlas, HCatalog) and proprietary tools (Cloudera Navigator, Zaloni Bedrock/Mica, Informatica Metadata Manager), and balance the pros, cons and gaps of each.
4) Security & Access Controls. Solving these challenges are key for adoption in regulatory driven industries like Healthcare & Financial Services. There are multiple Apache projects and proprietary tools to address this, but the challenge is making security and access controls consistent across the entire application and infrastructure stack, and over the data lifecycle, and being able to audit this in the face of legal challenges. This portion will explore available options and best practices.
5) Provisioning & Workflow Management. The real promise of the Data Lake is integrating Analytics workflows and tools on converged infrastructure-with shared data-and build “As A Service” oriented architectures that are oriented towards self-service data exploration and Analytics for end users. This is an emerging and immature area, but this session will explore some potential concepts, tools and options to achieve this.
This will be a moderately technical session, with the above topics being illustrated by real world examples. Attendees should have basic familiarity with Hadoop and the associated Apache projects.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
When you look at traditional ERP or management systems, they are usually used to manage the supply chain originating from either the point of Origin or point of destination which all our primarily physical locations. And for these, you have several processes like order to cash, source to pay, physical distribution, production etc.
In this session we take an in-depth look into the Apache Atlas open metadata and governance function.
Open metadata and governance is a moon-shot type of project to create a set of open APIs, types, and interchange protocols to allow all metadata repositories to share and exchange metadata. From this common base, it adds governance, discovery, and access frameworks to automate the collection, management, and use of metadata across an enterprise. The result is an enterprise catalog of data resources that are transparently assessed, governed, and used in order to deliver maximum value to the enterprise.
Apache Atlas is the reference implementation of the Open Metadata and Governance standards and framework (https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance). This function will enable an Apache Atlas server to synchronize and query metadata from any open metadata-compliant metadata repository.
In this session we will cover how Open Metadata and Governance works. This includes: (1) the key components in Atlas, (2) the different integration patterns and APIs that vendors can use to integrate their technology into the open metadata ecosystem, and (3) how common metadata use cases such as searching for data sets, managing security (through Atlas/Ranger integration), and automated metadata discovery work in the active ecosystem.
Speaker
Mandy Chessell, Distinguished Engineer, IBM
Securing and governing a multi-tenant data lake within the financial industryDataWorks Summit
Standard Bank South Africa is a Hortonworks client, with several multi-node clusters hosting Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF). This presentation will discuss the technical detail of implementing security, governance and multi-tenancy on a "Data Lake" within the finance industry. The talk will address the team's experiences, challenges, failures and learnings that we took away from this behemoth of an adventure.
After introducing Standard Bank and the Hadoop admin team, the presentation will describe the security and governance journey Standard Bank has undergone since the project's inception in 2015, as well as the roadmap for the future ahead.
Presentation structure:
1. Team introduction with background information
2. Environment overview (Where we are - Current)
-----Security
---------Authentication through Kerberos and LDAP/ AD
---------Authorization through Ranger and Centrify
---------Transparent Data Encryption (TDE) at rest
-----Governance
---------Centralized auditing
---------Ranger policies and data steward ownership
-----Multi-Tenancy
---------Data lake Vs. data analytics platform
---------Edge nodes Vs. API framework through Knox
3. How did we get to this stage? (Past)
-----Challenges faced (Kerberos, AD integration, SSL)
-----How we overcame these challenges
4. Future challenges we foresee (Future)
-----How we are planning to prepare for them
Speakers
Ian Pillay, Hadoop Administrator, Standard Bank
Brad Smith, Hadoop Administrator, Standard Bank
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
For firms in the financial industry, especially within regulated organizations such as credit card processors and banks, PCI DSS compliance has become a business and operational necessity. Although the blueprint of a PCI-compliant architecture varies from organization to organization, the mixture of modern Hadoop-based data lakes and legacy systems are a common theme.
In this talk, we will discuss recent updates to PCI DSS and how significant portions of PCI DSS compliance controls can be achieved using open source Hadoop security stack and technologies for the Hadoop ecosystem. We will provide a broad overview of implementing key aspects of PCI DSS standards at WorldPay such as encryption management, data protection with anonymization, separation of duties, and deployment considerations regarding securing the Hadoop clusters at the network layer from a practitioner’s perspective. The talk will provide patterns and practices map current Hadoop security capabilities to security controls that a PCI-compliant environment requires.
Speaker
David Walker, Enterprise Data Platform Programme Director, Worldpay
Srikanth Venkat, Senior Director Product Management, Hortonworks
Optimizing industrial operations using the big data ecosystemDataWorks Summit
GE Digital is undertaking a journey to optimize the reliability, availability, and efficiency of assets in the industrial sector and converge IT and OT. To do so, GE Digital is building cloud-based products that enable customers to analyze the asset data, detect anomalies, and provide recommendations for operating plants efficiently while increasing productivity. In a energy sector such as oil and gas, power, or renewables, a single plant comprises multiple complex assets, such as steam turbines, gas turbines, and compressors, to generate power. Each system contains various sensors to detect the operating conditions of the assets, generating large volumes of variety of data. A highly scalable distributed environment is required to analyze such a large volume of data and provide operating insights in near real time.
In this session I will share the challenges encountered when analyzing the large volumes of data, in-stream data analysis and how we standardized the industrial data based on data frames, and performance tuning.
Big data ingest frameworks ship with an array of connectors for common data origins and destinations, such as flat files, S3, HDFS, Kafka etc, but sometimes, you need to send data to, or receive data from a system that's not on the list. StreamSets includes template code for building your own connectors and processors; we'll walk through the process of building a simple destination that sends data to a REST web service, and show how it can be extended to target more sophisticated systems such as Salesforce Wave Analytics.
Information processing and analytics cannot be focused only on “store-first” or batch-based approaches. To provide maximum business value, information must also be analyzed closer to the source, and at the speed in which it is being created. Streaming analytics utilizes various techniques for intelligently processing data as it arrives at the edge or within the data center, with the purpose of proactively identifying threats or opportunities for your business.
Streaming Analytics - Comparison of Open Source Frameworks and ProductsKai Wähner
Stream Processing is a concept used to create a high-performance system for rapidly building applications that analyze and act on real-time streaming data. Benefits, amongst others, are faster processing and reaction to real-time complex event streams and the flexibility to quickly adapt to changing business and analytic needs. Big data, cloud, mobile and internet of things are the major drivers for stream processing and streaming analytics.
This session discusses the technical concepts of stream processing and how it is related to big data, mobile, cloud and internet of things. Different use cases such as predictive fault management or fraud detection are used to show and compare alternative frameworks and products for stream processing and streaming analytics.
The audience will understand when to use open source frameworks such as Apache Storm, Apache Spark or Esper, and powerful engines from software vendors such as IBM InfoSphere Streams or TIBCO StreamBase. Live demos will give the audience a good feeling about how to use these frameworks and tools.
The session will also discuss how stream processing is related to Hadoop and statistical analysis with software such as SAS, Apache Spark’s MLlib or R language.
Smart Manufacturing and Industry 4.0 - Tibco PoVNicola Sandoli
Smart Manufacturing and Industry 4.0: generating new insights and operational intelligence.
Manufacturers are increasingly relying on advanced analytics to understand data, anticipate and take proactive steps to prevent costly downtime and improve operational efficiency. Collecting real-time sensor data and mashups using machine learning techniques allows you to identify hidden insights into the potential equipment failures and operational discrepancies before they happen.
We’re in the midst of an exciting paradigm shift in terms of how we process events data in real time to better react to business opportunities or risk. To stay ahead of your competition, you need the ability to react to business-critical events as they happen. These critical events are created through diverse sources such as social interaction, machine sensors, or a customer transaction. How can you understand the meaning and context of these events that ultimately define your business?
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
The world gets connected more and more every year due to Mobile, Cloud and Internet of Things. "Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop to find patterns, e.g. for predictive maintenance or cross-selling. But how to increase revenue or reduce risks in new transactions? "Fast Data" via stream processing is the solution to embed patterns into future actions in real-time. This session discusses how machine learning and analytic models with R, Spark MLlib, H2O, etc. can be integrated into real-time event processing. A live demo concludes the session
NoSQL in Practice with TIBCO: Real World Use Cases and Customer Success Stori...Kai Wähner
NoSQL is not just about different storage alternatives such as document store, key value store, graphs or column-based databases. The hardware is also getting much more important. Besides common disks and SSDs, enterprises begin to use in-memory storages more and more because a distributed in-memory data grid provides very fast data access and update. While its performance will vary depending on multiple factors, it is not uncommon to be 100 times faster than corresponding database implementations. For this reason and others described in this session, in-memory computing is a great solution for lifting the burden of big data, reducing reliance on costly transactional systems, and building highly scalable, fault-tolerant applications. The session begins with a short introduction to in-memory computing. Afterwards, different frameworks and product alternatives are discussed for implementing in-memory solutions. Finally, the main part of this session shows several different real world uses cases where in-memory computing delivers business value by supercharging the infrastructure, e.g. to accelerate services, handle spikes in processing or ensure fault tolerance and disaster recovery.
A lot of in-memory data grid products are available. TIBCO ActiveSpaces, Oracle Coherence, Infinispan, IBM WebSphere eXtreme Scale, Hazelcast, Gigaspaces, GridGain, Pivotal Gemfire to name most of the important ones.
Take Action: The New Reality of Data-Driven BusinessInside Analysis
The Briefing Room with Dr. Robin Bloor and WebAction
Live Webcast on July 23, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=360d371d3a49ad256942f55350aa0a8b
The waiting used to be the hardest part, but not anymore. Today’s cutting-edge enterprises can seize opportunities faster than ever, thanks to an array of technologies that enable real-time responsiveness across the spectrum of business processes. Early adopters are solving critical business challenges by enabling the rapid-fire design, development and production of very specific applications. Functionality can range from improved customer engagement to dynamic machine-to-machine interactions.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor, who will tout a new era in data-driven organizations, and why a data flow architecture will soon be critical for industry leaders. He’ll be briefed by Sami Akbay of WebAction, who will showcase his company’s real-time data management platform, which combines all the component parts needed to access, process and leverage data big and small. He’ll explain how this new approach can provide game-changing power to organizations of all types and sizes.
Visit InsideAnlaysis.com for more information.
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Kai Wähner
A lot of data grid products are available. TIBCO ActiveSpaces, Oracle Coherence, Infinispan, IBM WebSphere eXtreme Scale, Hazelcast, Gigaspaces, GridGain, Pivotal Gemfire to name most of the important ones. Not SAP HANA!
The goal of my talk was not very technical. Instead, I discussed several different real world use cases and success stories for using in-memory data grids. Here is the abstract for my talk:
NoSQL is not just about different storage alternatives such as document store, key value store, graphs or column-based databases. The hardware is also getting much more important. Besides common disks and SSDs, enterprises begin to use in-memory storages more and more because a distributed in-memory data grid provides very fast data access and update. While its performance will vary depending on multiple factors, it is not uncommon to be 100 times faster than corresponding database implementations. For this reason and others described in this session, in-memory computing is a great solution for lifting the burden of big data, reducing reliance on costly transactional systems, and building highly scalable, fault-tolerant applications.The session begins with a short introduction to in-memory computing. Afterwards, different frameworks and product alternatives are discussed for implementing in-memory solutions. Finally, the main part of this session shows several different real world uses cases where in-memory computing delivers business value by supercharging the infrastructure.
Digital Business Transformation in the Streaming EraAttunity
Enterprises are rapidly adopting stream computing backbones, in-memory data stores, change data capture, and other low-latency approaches for end-to-end applications. As businesses modernize their data architectures over the next several years, they will begin to evolve toward all-streaming architectures. In this webcast, Wikibon, Attunity, and MemSQL will discuss how enterprise data professionals should migrate their legacy architectures in this direction. They will provide guidance for migrating data lakes, data warehouses, data governance, and transactional databases to support all-streaming architectures for complex cloud and edge applications. They will discuss how this new architecture will drive enterprise strategies for operationalizing artificial intelligence, mobile computing, the Internet of Things, and cloud-native microservices.
Link to the Wikibon report - wikibon.com/wikibons-2018-big-data-analytics-trends-forecast
Link to Attunity Streaming CDC Book Download - http://www.bit.ly/cdcbook
Link to MemSQL's Free Data Pipeline Book - http://go.memsql.com/oreilly-data-pipelines
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
If there were a buzzword of the hour, it would certainly be "data mesh"! This new architectural paradigm unlocks analytic data at scale and enables rapid access to an ever-growing number of distributed domain datasets for various usage scenarios.
As such, the data mesh addresses the most common weaknesses of the traditional centralized data lake or data platform architecture. And the heart of a data mesh infrastructure must be real-time, decoupled, reliable, and scalable.
This presentation explores how Apache Kafka, as an open and scalable decentralized real-time platform, can be the basis of a data mesh infrastructure and - complemented by many other data platforms like a data warehouse, data lake, and lakehouse - solve real business problems.
There is no silver bullet or single technology/product/cloud service for implementing a data mesh. The key outcome of a data mesh architecture is the ability to build data products; with the right tool for the job.
A good data mesh combines data streaming technology like Apache Kafka or Confluent Cloud with cloud-native data warehouse and data lake architectures from Snowflake, Databricks, Google BigQuery, et al.
Assessing New Databases– Translytical Use CasesDATAVERSITY
Organizations run their day-in-and-day-out businesses with transactional applications and databases. On the other hand, organizations glean insights and make critical decisions using analytical databases and business intelligence tools.
The transactional workloads are relegated to database engines designed and tuned for transactional high throughput. Meanwhile, the big data generated by all the transactions require analytics platforms to load, store, and analyze volumes of data at high speed, providing timely insights to businesses.
Thus, in conventional information architectures, this requires two different database architectures and platforms: online transactional processing (OLTP) platforms to handle transactional workloads and online analytical processing (OLAP) engines to perform analytics and reporting.
Today, a particular focus and interest of operational analytics includes streaming data ingest and analysis in real time. Some refer to operational analytics as hybrid transaction/analytical processing (HTAP), translytical, or hybrid operational analytic processing (HOAP). We’ll address if this model is a way to create efficiencies in our environments.
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing - NoSQL ma...NoSQLmatters
Kai Wähner – Real World Use Cases for Realtime In-Memory Computing
NoSQL is not just about different storage alternatives such as document store, key value store, graphs or column-based databases. The hardware is also getting much more important. Besides common disks and SSDs, enterprises begin to use in-memory storages more and more because a distributed in-memory data grid provides very fast data access and update. While its performance will vary depending on multiple factors, it is not uncommon to be 100 times faster than corresponding database implementations. For this reason and others described in this session, in-memory computing is a great solution for lifting the burden of big data, reducing reliance on costly transactional systems, and building highly scalable, fault-tolerant applications.The session begins with a short introduction to in-memory computing. Afterwards, different frameworks and product alternatives are discussed for implementing in-memory solutions. Finally, the main part of this session shows several different real world uses cases where in-memory computing delivers business value by supercharging the infrastructure.
The digital transformation is going forward due to Mobile, Cloud and Internet of Things. Disrupting business models leverage Big Data Analytics and Machine Learning.
"Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop or other platforms. Business Intelligence tools and statistical computing are used to draw new knowledge and to find patterns from this data, for example for promotions, cross-selling or fraud detection. The key challenge is how these findings can be integrated from historical data into new transactions in real time to make customers happy, increase revenue or prevent fraud. "Fast Data" via stream processing is the solution to embed patterns - which were obtained from analyzing historical data - into future transactions in real-time.
This session uses several real world success stories to explain the concepts behind stream processing and its relation to Hadoop and other big data platforms. It discusses how patterns and statistical models of R, Spark MLlib, H2O, and other technologies can be integrated into real-time processing by using several different real world case studies. The session also points out why a Microservices architecture helps solving the agile requirements for these kind of projects.
A brief overview of available open source frameworks and commercial products shows possible options for the implementation of stream processing, such as Apache Storm, Apache Flink, Spark Streaming, IBM InfoSphere Streams, or TIBCO StreamBase.
A live demo shows how to implement stream processing, how to integrate machine learning, and how human operations can be enabled in addition to the automatic processing via a Web UI and push events.
Keywords: Big Data, Fast Data, Machine Learning, Analytics, Analytic Model, Stream Processing, Event Processing, Streaming Analytics, Real Time, Hadoop, Spark, MLlib, Streaming, R, TERR, TIBCO, Spotfire, StreamBase, Live Datamart, H20, Predictive Analytics, Data Discovery, Insights, Patterns
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
The Briefing Room with David Loshin and Datawatch
Live Webcast Feb. 17, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=4a053043c45cf0c2f6453dfb8577c72a
Patience may be a virtue, but when it comes to streaming analytics, waiting is no option. Between Big Data and the Internet of Things, businesses are faced with more data and greater complexity than ever before. Traditional information architectures simply cannot support the kind of processing necessary to make use of this fast-moving resource. The modern context requires a shorter path to analytics, one that narrows the gap between governance and discovery
Register for this episode of The Briefing Room to hear veteran Analyst David Loshin as he explains how the prevalence of streaming data is changing business pace and processes. He’ll be briefed by Dan Potter of Datawatch, who will tout his company’s real-time data discovery platform for data in motion. He will show how self-service data preparation can lead to faster insights, ultimately fostering the ability to make precise decisions at the right time.
Visit InsideAnalysis.com for more information.
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...BigDataEverywhere
Hadoop use cases have historically trended towards cost reduction through data warehouse offload. More recently, an uptick around customer-centric use cases have proven the ability for Hadoop to drive top-line revenue. In this session, Platfora solution architect Rob Rosen will discuss how the ability to coreelate multi-structured data in Hadoop leads to greater customer adoption, expanded cross-selling and reduced customer churn for enterprises deploying Hadoop-centric data lakes.
From an experiment to a real production environmentDataWorks Summit
Rabobank is a worldwide food- and agri-bank from the Netherlands. Rabobank wants to make a substantial contribution to welfare and prosperity in the Netherlands and to feeding the world sustainably. Rabobank Group operates through Rabobank and its subsidiaries in 40 countries.
Rabobank is active in both retail and wholesale banking. For our wholesale clients we provide real-time business insight information by making use of Cloudera and Hortonworks technology. An example is our recently launched service that gives insight in market performance of Rabobank customers, starting with the dairy farmers market segment, by making use of benchmark information. Our current technology stack contains Hortonworks Data Flow (HDF) and Cloudera Hadoop (CDH). Our real-time data stream is implemented by making use of Kafka and Nifi from HDF. Cloudera is used to store the data needed for the business insight information, mainly in HDFS and HBase.
During our presentation we will provides insight about the project approach, the architecture and actual implementation.
Speaker
Jeroen Wolffensperger, Solution Architect Data, Rabobank
Martijn Groen, Delivery Manager Data , Rabobank Netherlands
Similar to Stream Processing as Game Changer for Big Data and Internet of Things by Kai Wahner (20)
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
Insights can only be as good as the data. The data quality domain is enormously large, so you need to understand your company pain points to know what to focus on first.
https://www.bigdataspain.org/2017/talk/big-data-big-quality
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
2gether is a financial platform based on Blockchain, Big Data and Artificial Intelligence that allows interaction between users and third-party services in a single interface.
https://www.bigdataspain.org/2017/talk/scaling-a-backend-for-a-big-data-and-blockchain-environment
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
All modern Big Data solutions, like Hadoop, Kafka or the rest of the ecosystem tools, are designed as distributed processes and as such include some sort of redundancy for High Availability.
https://www.bigdataspain.org/2017/talk/disaster-recovery-for-big-data
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
In this presentation, attendees will see how to speed up existing Hadoop and Spark deployments by just making Apache Ignite responsible for RAM utilization. No code modifications, no new architecture from scratch!
https://www.bigdataspain.org/2017/talk/boost-hadoop-and-spark-with-in-memory-technologies
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
The power of this new set of tools for Data Science. Is really easy to start applying these technics in your current workflow.
https://www.bigdataspain.org/2017/talk/data-science-for-lazy-people-automated-machine-learning
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
GPUs on the cloud as Infrastructure as a Service (IaaS) seem a commodity. However to efficiently distribute deep learning tasks on several GPUs is challenging.
https://www.bigdataspain.org/2017/talk/training-deep-learning-models-on-multiple-gpus-in-the-cloud
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
Unbalanced data is a specific data configuration that appears commonly in nature. Applying machine learning techniques to this kind of data is a difficult process, usually addressed by unbalanced reduction techniques.
https://www.bigdataspain.org/2017/talk/unbalanced-data-same-algorithms-different-techniques
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
Time series related problems have traditionally been solved using engineered features obtained by heuristic processes.
https://www.bigdataspain.org/2017/talk/state-of-the-art-time-series-analysis-with-deep-learning
Big Data Spain 2017
November 16th - 17th
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
Not long ago only banks and hedge funds could afford doing automated and High Frequency Trading, that is, the ability to send buy commodities in microseconds intervals.
https://www.bigdataspain.org/2017/talk/trading-at-market-speed-with-the-latest-kafka-features
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
The shift to stream processing at LinkedIn has accelerated over the past few years. We now have over 200 Samza applications in production processing more than 260B events per day.
https://www.bigdataspain.org/2017/talk/apache-samza-jake-maes
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
IBM has built a “Data Science Experience” cloud service that exposes Notebook services at web scale.
https://www.bigdataspain.org/2017/talk/the-analytic-platform-behind-ibms-watson-data-platform
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
Artificial Intelligence and Data-centric businesses.
https://www.bigdataspain.org/2017/talk/tbc
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
Ten years ago there were rumours of the death of causal inference. Big data was supposed to enable us to rely on purely correlational data to predict and control the world.
https://www.bigdataspain.org/2017/talk/why-big-data-didnt-end-causal-inference
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
The Meme of the Internet Index will be the new normal to analyze and predict facts and sensations which go around the Internet.
https://www.bigdataspain.org/2017/talk/meme-index-analyzing-fads-and-sensations-on-the-internet
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
Geotab is a leader in the expanding world of Internet of Things (IoT) and telematics industry with Big Data.
https://www.bigdataspain.org/2017/talk/vehicle-big-data-that-drives-smart-city-advancement
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
The talk will focus on explaining why operational databases do not scale due to limitations in legacy transactional management.
https://www.bigdataspain.org/2017/talk/end-of-the-myth-ultra-scalable-transactional-management
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
In recent years Machine Learning (ML) and especially Deep Learning (DL) have achieved great success in many areas such as visual recognition, NLP or even aiding in medical research.
https://www.bigdataspain.org/2017/talk/attacking-machine-learning-used-in-antivirus-with-reinforcement
Big Data Spain 2017
16th - 17th Kinépolis Madrid
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
Primary function of banking sector is promoting economic activity; which means “commerce”, exchanging what someone produces-has for something that someone consumes-desires.
https://www.bigdataspain.org/2017/talk/more-people-less-banking-blockchain
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
Bol.com has been an early Hadoop user: since 2008 where it was first built for a recommendation algorithm.
https://www.bigdataspain.org/2017/talk/make-the-elephant-fly-once-again
Big Data Spain 2017
16th - 17th Kinépolis Madrid
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
9. Data Monitoring
• Motor temperature
• Motor vibration
• Current
• Intake pressure
• Intake
temperature
Ø Flow
Electrical power cable
Pump
Intake
Protector
ESP motor
Pump monitoring unit
Electric Submersible
Pumps (ESP)
Predictive Analytics - Fault Management
18. Scenario: Predictive Scrapping of Parts in an Assembly Line
Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.
Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?
Station 1 Station 2
Cost Before
9€
7€ 13€
Total Cost
29€
(or more)
Scrap? Scrap?
40. Big Data Reference Architecture
Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Data Warehouse, Hadoop
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI
49. Dataflow Streaming Pipeline (Ingest, Preprocess)
Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Data Warehouse, Hadoop
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI
51. Streaming Analytics
Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Data Warehouse, Hadoop
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI
69. Dataflow Streaming Pipeline + Streaming Analytics
Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Data Warehouse, Hadoop
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI
74. Live UI for Augmented Intelligence
Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Data Warehouse, Hadoop
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI
83. Data Discovery via Visual Analytics, Big Data and Machine Learning
Augmented Intelligence
Operations
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Continuous query
processing
Alerts
Manual action,
escalation
Data Discovery
Python
R
Data
Scientists
Cleansed
Data
History
Visual Analytics
Spark
Integration
ERP MDM DB WMS
SOA / Microservices
BIG DATA
Data Warehouse, Hadoop
Internal Data
Integration Bus
API
Event Server
H2O.ai
Live UI
84. Find Insights and Patterns in Historical Data
Visual Analytics + Machine Learning
85. Apply Insights and Analytic Models to Proactive Actions
Streaming
AnalyticsH20.ai
Open Source
R
TERR
Spark ML
MATLAB
SAS
PMML
87. Case Study: Streaming Analytics for Betting
• Situation: Today, 80% of Betting is Done After the
Game Starts
• It’s not your father’s bookie anymore!
• Problem: How to Analyze Big Betting Data?
• Thousands of concurrent games, constantly adjusting odds, dozens of
betting networks – firms must correlate millions of events a day to
find the best betting opportunities in real-time
• Solution: TIBCO for Fast Data Architecture
• TXOdds uses TIBCO to correlate, aggregate, and analyze large
volumes of streaming betting data in real-time and publish innovative
predictive betting analytics to their customers
• Result: TXOdds First to Market with Innovative Zero
Latency Betting Analytics
• Innovative real-time analytics help players who can process electronic
data in real-time the edge
“With StreamBase, in two
months we had our first
betting analytics feed live,
and we continually deploy
new ideas and evolve our
old ones.”
- Alex Kozlenkov, VP of technology,
TXOdds
89. Real-Time Social Media Analytics
Twitter
(#TomBradyBrokenLeg)
Twitter (#Boston)
Brady’s
Stats
Actionable
Insights
Twitter (#NFL)
Something relevant happening?
Every second counts!
Change Odds (automated or manually triggered):
Stop live-betting for the current running game?
• Who will win the game?
• How many interceptions will the Quarterback throw?
• Will the Patriots win the Super Bowl?
• …
92. Scenario: Predictive Scrapping of Parts in an Assembly Line
Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.
Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?
Station 1 Station 2
Cost Before
9€
7€ 13€
Total Cost
29€
(or more)
Scrap? Scrap?
93. Big Data Architecture for Predictive Maintenance
Operational Analytics
Operations
Live UI
CSV Batch
JSON Real Time
XML Real Time
Streaming AnalyticsAction
Aggregate
Rules
Analytics
Correlate
Live Datamart
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS Data
Scientists
Flume
HDFS
Spotfire
R / TERR
HDFS
Hadoop (Cloudera)
StreamBase
TIBCO Fast Data Platform
H2O
Oracle RDBMS
Avro Parquet … PMML
Internal Data
96. Monitor Patterns à TIBCO Live Datamart
Augmented Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Desktop Client
97. Monitor Patterns à TIBCO Live Datamart
Augmented Intelligence (“Monitor the manufacturing process and change rules in real time!”)
Live Dartmart Web API
98. TIBCO Spotfire + StreamBase + Live Datamart + H2O.ai
Live DemoLive Demo