What is the future of Hadoop?
What is the new future of Hadoop?
How is that different from the old one?
Here is how I answered these questions at the winter Hadoop Conference of Japan 2013.
I gave this talk at Buzzwords just now to fill in for an ill speaker.
The topics include things that are being added to or taken out of Mahout. These include cruft (out), fast clustering (in), nearest neighbor search (in), Pig bindings for Mahout (who knows).
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...Channy Yun
YUN, SEOKCHAN, 1997, The Construction of the Internet Geological Data System Using WWW+Java+DB Technique, Tertiary Deposits of Korea, AAPG Annual Convention Abstracts, Association of American Petroleum Geologists 1997.4.23-26, Dallas, TX, USA, p.420
I gave this talk at Buzzwords just now to fill in for an ill speaker.
The topics include things that are being added to or taken out of Mahout. These include cruft (out), fast clustering (in), nearest neighbor search (in), Pig bindings for Mahout (who knows).
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...Channy Yun
YUN, SEOKCHAN, 1997, The Construction of the Internet Geological Data System Using WWW+Java+DB Technique, Tertiary Deposits of Korea, AAPG Annual Convention Abstracts, Association of American Petroleum Geologists 1997.4.23-26, Dallas, TX, USA, p.420
The unification of big and little data processing onto a single platform is an important requirement for Hadoop. How can this be achieved? I explain what is needed for three important use cases.
What is the future of Hadoop?
What is the new future of Hadoop?
How is that different from the old one?
Here is how Ted Dunning answered these questions at the winter Hadoop Conference of Japan 2013.
The unification of big and little data processing onto a single platform is an important requirement for Hadoop. How can this be achieved? Ted Dunning explains what is needed for three important use cases.
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations.
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
Examine the unique features of the MapR Converged Data Platform and how they can support production-grade enterprise machine learning - Ends with a live demo using H2O - Presented at Hadoop Summit Tokyo 2016
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
My presentation from AnacondaCON 2018 where I discussed using Recurrent Neural Networks, Python, Tensorflow and the MapR Platform to develop deploy a predictive maintenance model for an IoT device in the manufacturing industry.
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise systems to improve overall analytical throughput and build new data-driven products. This session covers: ways to achieve high-performance integration between Hadoop and relational-based systems; Hadoop+NoSQL vs Hadoop+SQL architectures; high-speed, massively parallel data transfer to analytical platforms that can aggregate web log data with granular fact data; and strategies for freeing up capacity for more explorative, iterative analytics and ad hoc queries.
We describe an application of CEP using a microservice-based streaming architecture. We use Drools business rule engine to apply rules in real time to an event stream from IoT traffic sensor data.
We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.
The unification of big and little data processing onto a single platform is an important requirement for Hadoop. How can this be achieved? I explain what is needed for three important use cases.
What is the future of Hadoop?
What is the new future of Hadoop?
How is that different from the old one?
Here is how Ted Dunning answered these questions at the winter Hadoop Conference of Japan 2013.
The unification of big and little data processing onto a single platform is an important requirement for Hadoop. How can this be achieved? Ted Dunning explains what is needed for three important use cases.
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations.
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
Examine the unique features of the MapR Converged Data Platform and how they can support production-grade enterprise machine learning - Ends with a live demo using H2O - Presented at Hadoop Summit Tokyo 2016
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
My presentation from AnacondaCON 2018 where I discussed using Recurrent Neural Networks, Python, Tensorflow and the MapR Platform to develop deploy a predictive maintenance model for an IoT device in the manufacturing industry.
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise systems to improve overall analytical throughput and build new data-driven products. This session covers: ways to achieve high-performance integration between Hadoop and relational-based systems; Hadoop+NoSQL vs Hadoop+SQL architectures; high-speed, massively parallel data transfer to analytical platforms that can aggregate web log data with granular fact data; and strategies for freeing up capacity for more explorative, iterative analytics and ad hoc queries.
We describe an application of CEP using a microservice-based streaming architecture. We use Drools business rule engine to apply rules in real time to an event stream from IoT traffic sensor data.
We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.
The folk wisdom has always been that when running stateful applications inside containers, the only viable choice is to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it’s considered a problem because stateful containers generally can’t preserve that state across restarts.
In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.
Ted Dunning describes recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments, it’s even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.
The benefits of systems like this extend beyond management simplicity, because applications can be more Agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.
Ellen Friedman and I spoke at the ACM meetup about how stream-first architecture can have a big impact and how the logistics of machine learning is a great example of that impact.
This is my half of the presentation.
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
Tensors are a very useful tool for mathematical programming. Moreover, the optimization frameworks that are part of most machine learning frameworks have some very cool uses outside of the normal machine learning kinds of tasks.
The logistics of machine learning typically take waaay more effort than the machine learning itself. Moreover, machine learning systems aren't like normal software projects so continuous integration takes on new meaning.
You know that a single number isn't a good summary of a measurement. T-digest and other non-uniform histograms can make it easy to keep track of an entire distribution and can be combined in OLAP queries.
The latest t-digest is faster, more accurate and has hard bounds on size.
This talk shows practical methods for find changes in a variety of kinds of data as well as giving real-world examples from finance, telecom, systems monitoring and natural language processing.
This was one of the talks that I gave at the Strata San Jose conference. I migrated my topic a bit, but here is the original abstract:
Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging.
Messaging solutions have existed for a long time. However, when compared to legacy systems, newer solutions like Apache Kafka offer higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than legacy systems and have vastly different architectures. But do these benefits outweigh any tradeoffs in functionality? Ted Dunning dives into the architectural details and tradeoffs of both legacy and new messaging solutions to find the ideal messaging system for Hadoop.
Topics include:
* Queues versus logs
* Security issues like authentication, authorization, and encryption
* Scalability and performance
* Handling applications that span multiple data centers
* Multitenancy considerations
* APIs, integration points, and more
This talk focuses on how larger data sets are not only enabling advanced techniques, but also increasing the number of problems within reach of relatively simple techniques, that is "cheap learning".
These are the slides from my talk at FAR Con in Minneapolis recently. The topics are the implications of buried treasure hoards on data security, horror stories and new, simpler and provably secure methods for public data disclosure.
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
This talk describes how indicator-based recommendations can be evolved in real time. Normally, indicator-based recommendations use a large off-line computation to understand the general structure of items to be recommended and then make recommendations in real-time to users based on a comparison of their recent history versus the large-scale product of the off-line computation.
In this talk, I show how the same components of the off-line computation that guarantee linear scalability in a batch setting also give strict real-time bounds on the cost of a practical real-time implementation of the indicator computation.
How the Internet of Things is Turning the Internet Upside DownTed Dunning
This is a wide-ranging talk that goes into how the internet is architected, how that architecture is changing as a result of internet of things, how the internet of things worked in the 19th century big data, open-source community and how to build time-series databases to make this all possible.
Really.
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
Apache Kylin (incubating) is a new project to bring OLAP cubes to Hadoop. I walk through the project and describe how it works and how users see the project.
Many statistics are impossible to compute precisely on streaming data. There are some very clever algorithms, however, which allow us to compute very good approximations of these values efficiently in terms of CPU and memory.
Anomaly Detection - New York Machine LearningTed Dunning
Anomaly detection is the art of finding what you don't know how to ask for. In this talk, I walk through the why and how of building probabilistic models for a variety of problems including continuous signals and web traffic. This talk blends theory and practice in a highly approachable way.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP