The document provides an overview of Apache Hadoop and common use cases. It describes how Hadoop is well-suited for log processing due to its ability to handle large amounts of data in parallel across commodity hardware. Specifically, it allows processing of log files to be distributed per unit of data, avoiding bottlenecks that can occur when trying to process a single large file sequentially.
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
This presentation provides a brief insight into a Big Data platform using the Hadoop ecosystem.
To this end the presentation will touch on:
-views of the Big Data ecosystem and it’s components
-an example of a Hadoop cluster
-considerations when selecting a Hadoop distribution
-some of the Hadoop distributions available
-a recommended Hadoop distribution
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBigDataExpo
When evaluating Apache Hadoop organizations often identifiy dozens of use cases for Hadoop but wonder where do you start? With hundreds of customer implementations of the platform we have seen that successful organizations start small in scale and small in scope. Join us in this session as we review common deployment patterns and successful implementations that will help guide you on your journey of cost optimization and new analytics with Hadoop.
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
Learn how CARFAX utilized the power of Control-M to help drive big data processing via Cloudera. See why it was a no-brainer to choose Control-M to help manage workflows through Hadoop, some of the challenges faced, and the benefits the business received by using an existing, enterprise-wide workload management system instead of choosing “yet another tool.”
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
Real Time Monitoring requires a high scalable infrastructure of message bus, database, distributed event processing and scalable analytics engine. By bringing together leading open source projects of Apache Kafka, Apache HBase, Apache Storm and Apache Hive, the Hortonworks Data Platform offers a comprehensive Real Time Analysis platform. In this session, we will provide an in-depth overview all the key technology components and demonstrate a working solution for monitoring a fleet of trucks.
Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=0278dc8aa49a9991e1ce436c71f53d30
Hadoop Reporting and Analysis - JaspersoftHortonworks
Hadoop is deployed for a variety of uses, including web analytics, fraud detection, security monitoring, healthcare, environmental analysis, social media monitoring, and other purposes.
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
This presentation provides a brief insight into a Big Data platform using the Hadoop ecosystem.
To this end the presentation will touch on:
-views of the Big Data ecosystem and it’s components
-an example of a Hadoop cluster
-considerations when selecting a Hadoop distribution
-some of the Hadoop distributions available
-a recommended Hadoop distribution
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBigDataExpo
When evaluating Apache Hadoop organizations often identifiy dozens of use cases for Hadoop but wonder where do you start? With hundreds of customer implementations of the platform we have seen that successful organizations start small in scale and small in scope. Join us in this session as we review common deployment patterns and successful implementations that will help guide you on your journey of cost optimization and new analytics with Hadoop.
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
Learn how CARFAX utilized the power of Control-M to help drive big data processing via Cloudera. See why it was a no-brainer to choose Control-M to help manage workflows through Hadoop, some of the challenges faced, and the benefits the business received by using an existing, enterprise-wide workload management system instead of choosing “yet another tool.”
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
Real Time Monitoring requires a high scalable infrastructure of message bus, database, distributed event processing and scalable analytics engine. By bringing together leading open source projects of Apache Kafka, Apache HBase, Apache Storm and Apache Hive, the Hortonworks Data Platform offers a comprehensive Real Time Analysis platform. In this session, we will provide an in-depth overview all the key technology components and demonstrate a working solution for monitoring a fleet of trucks.
Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=0278dc8aa49a9991e1ce436c71f53d30
Hadoop Reporting and Analysis - JaspersoftHortonworks
Hadoop is deployed for a variety of uses, including web analytics, fraud detection, security monitoring, healthcare, environmental analysis, social media monitoring, and other purposes.
Near real-time, big data analytics is a reality via a new data pattern that avoids the latency and overhead of legacy ETL–the 3 T’s of Hadoop: Transfer, Transform, and Translate. Transfer: Once a Hadoop infrastructure is in place, a mandate is needed to immediately and continuously transfer all enterprise data, from external and internal sources and through different existing systems, into Hadoop. Previously, enterprise data was isolated, disconnected and monolithically segmented. Through this T, various source data are consolidated and centralized in Hadoop almost as they are generated in near real-time. Transform: Most of the enterprise data, when flowing into Hadoop, is transactional in nature. Analytics requires data be transformed from record-based OLTP form to column-based OLAP. This T is not the same T in ETL as we need to retain the granularity in the data feeds. The key is to transform in-place within Hadoop, without further data movement from Hadoop to other legacy systems. Translate: We pre-compute or provide on-the-fly views of analytical data, exposed for consumption. We facilitate analysis and reporting, for both scheduled and ad hoc needs, to be interactive with the data for analysts and end users, integrated in and on top of Hadoop.
Slides from the joint webinar. Learn how Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your Data Science efforts.
Together, Pivotal HAWQ and the Hortonworks Data Platform provide businesses with a Modern Data Architecture for IT transformation.
YARN Ready: Integrating to YARN with Tez Hortonworks
YARN Ready webinar series helps developers integrate their applications to YARN. Tez is one vehicle to do that. We take a deep dive including code review to help you get started.
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
No matter if you are new to Hadoop or have a mature cluster in production, scale will be a critical factor of your success with Hadoop. Are you ready to take the next big step as you scale out your data architecture?
Talend and Hortonworks discuss where we will help you learn how to implement an effective big data and Hadoop strategy across your IT infrastructure. You will learn:
How to grow a pilot into production
How to scale-out architecture & systems affordably
How to leverage the flexibility of Hadoop to optimize your data integration processes
Recording: http://www.talend.com/resources/webinars/starting-small-and-scaling-big-with-hadoop
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
This webinar focuses on introducing Scalding for developers and writing applications for Hadoop and YARN using Scalding. Guest speaker Jonathan Coveney from Twitter provides an overview, use cases, limitations, and core concepts.
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
Learn how when an organizations combine HP and Vertica Analytics Platform and Hortonworks, they can quickly explore and analyze broad variety of data types to transform to actionable information that allows them to better understand how their customers and site visitors interact with their business, offline and online.
Hortonworks Yarn Code Walk Through January 2014Hortonworks
This slide deck accompanies the Webinar recording YARN Code Walk through on Jan. 22, 2014, on Hortonworks.com/webinars under Past Webinars, or
https://hortonworks.webex.com/hortonworks/lsr.php?AT=pb&SP=EC&rID=129468197&rKey=b645044305775657
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.
In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.
Near real-time, big data analytics is a reality via a new data pattern that avoids the latency and overhead of legacy ETL–the 3 T’s of Hadoop: Transfer, Transform, and Translate. Transfer: Once a Hadoop infrastructure is in place, a mandate is needed to immediately and continuously transfer all enterprise data, from external and internal sources and through different existing systems, into Hadoop. Previously, enterprise data was isolated, disconnected and monolithically segmented. Through this T, various source data are consolidated and centralized in Hadoop almost as they are generated in near real-time. Transform: Most of the enterprise data, when flowing into Hadoop, is transactional in nature. Analytics requires data be transformed from record-based OLTP form to column-based OLAP. This T is not the same T in ETL as we need to retain the granularity in the data feeds. The key is to transform in-place within Hadoop, without further data movement from Hadoop to other legacy systems. Translate: We pre-compute or provide on-the-fly views of analytical data, exposed for consumption. We facilitate analysis and reporting, for both scheduled and ad hoc needs, to be interactive with the data for analysts and end users, integrated in and on top of Hadoop.
Slides from the joint webinar. Learn how Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your Data Science efforts.
Together, Pivotal HAWQ and the Hortonworks Data Platform provide businesses with a Modern Data Architecture for IT transformation.
YARN Ready: Integrating to YARN with Tez Hortonworks
YARN Ready webinar series helps developers integrate their applications to YARN. Tez is one vehicle to do that. We take a deep dive including code review to help you get started.
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
No matter if you are new to Hadoop or have a mature cluster in production, scale will be a critical factor of your success with Hadoop. Are you ready to take the next big step as you scale out your data architecture?
Talend and Hortonworks discuss where we will help you learn how to implement an effective big data and Hadoop strategy across your IT infrastructure. You will learn:
How to grow a pilot into production
How to scale-out architecture & systems affordably
How to leverage the flexibility of Hadoop to optimize your data integration processes
Recording: http://www.talend.com/resources/webinars/starting-small-and-scaling-big-with-hadoop
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
This webinar focuses on introducing Scalding for developers and writing applications for Hadoop and YARN using Scalding. Guest speaker Jonathan Coveney from Twitter provides an overview, use cases, limitations, and core concepts.
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
Learn how when an organizations combine HP and Vertica Analytics Platform and Hortonworks, they can quickly explore and analyze broad variety of data types to transform to actionable information that allows them to better understand how their customers and site visitors interact with their business, offline and online.
Hortonworks Yarn Code Walk Through January 2014Hortonworks
This slide deck accompanies the Webinar recording YARN Code Walk through on Jan. 22, 2014, on Hortonworks.com/webinars under Past Webinars, or
https://hortonworks.webex.com/hortonworks/lsr.php?AT=pb&SP=EC&rID=129468197&rKey=b645044305775657
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.
In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.
Viene illustrato il problema della raccolta efficiente e scalabile dei dati da potenziali sorgenti di Big Data. Inoltre verrà fatta una carrellata su alcuni tra i più popolari software utilizzabili in una pipeline di data streaming in realtime e/o batch analysis.
Video: http://www.youtube.com/watch?v=BT8WvQMMaV0
Hadoop is the technology of choice for processing large data sets. At salesforce.com, we service internal and product big data use cases using a combination of Hadoop, Java MapReduce, Pig, Force.com, and machine learning algorithms. In this webinar, we will discuss an internal use case and a product use case:
Product Metrics: Internally, we measure feature usage using a combination of Hadoop, Pig, and the Force.com platform (Custom Objects and Analytics).
Community-Based Recommendations: In Chatter, our most successful people and file recommendations are built on a collaborative filtering algorithm that is implemented on Hadoop using Java MapReduce.
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
What if your organization could obtain a 360 degree view of the customer across offline, online and social and mobile channels? Attend this webinar with Splunk and Hortonworks and see examples of how marketing, business and operations analysts can reach across disparate data sets in Hadoop to spot new opportunities for up-sell and cross-sell. We'll also cover examples of how to measure buyer sentiment and changes in buyer behavior. Along with best practices on how to use data in Hadoop with Splunk to assign customer influence scores that online, call-center, and retail branches can use to customize more compelling products and promotions.
Fundamentals of Big Data, Hadoop project design and case study or Use case
General planning consideration and most necessaries in Hadoop ecosystem and Hadoop projects
This will provide the basis for choosing the right Hadoop implementation, Hadoop technologies integration, adoption and creating an infrastructure.
Building applications using Apache Hadoop with a use-case of WI-FI log analysis has real life example.
Building a reliable pipeline of data ingress, batch computation, and data egress with Hadoop can be a major challenge. Most folks start out with cron to manage workflows, but soon discover that doesn't scale past a handful of jobs. There are a number of open-source workflow engines with support for Hadoop, including Azkaban (from LinkedIn), Luigi (from Spotify), and Apache Oozie. Having deployed all three of these systems in production, Joe will talk about what features and qualities are important for a workflow system.
With the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that connects the systems powering business transactions and business intelligence. Hadoop is uniquely capable of storing, aggregating, and refining multi-structured data sources into formats that fuel new business insights. Apache Hadoop is fast becoming the defacto platform for processing Big Data. Hadoop started from a relatively humble beginning as a point solution for small search systems. Its growth into an important technology to the broader enterprise community dates back to Yahoo’s 2006 decision to evolve Hadoop into a system for solving its internet scale big data problems. Eric will discuss the current state of Hadoop and what is coming from a development standpoint as Hadoop evolves to meet more workloads.
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
Александр Козлов, Cloudera Inc.
Александр Козлов, старший архитектор в Cloudera Inc., работает с большими компаниями, многие из которых находятся в рейтинге Fortune 500, над проектами по созданию систем анализа большого количества данных. Закончил аспирантуру физического факультета Московского государственного университета, после чего также получил степень Ph.D. в Стэнфорде. До Cloudera и после окончания учебы работал над статистическим анализом данных и соответствующими компьютерными технологиями в SGI, Hewlett-Packard, а также стартапе Turn.
Тема доклада
Контроль зверей: инструменты для управления и мониторинга распределенных систем от Cloudera.
Тезисы
Поддержание распределенных систем, состоящих из тысяч компьютеров, является сложной задачей. Компания Cloudera, которая специализируется на создании распределенных технологий, разработала набор средств для централизованного управления распределенных Hadoop/HBase кластеров. Hadoop и HBase являются проектами Apache Software Foundation, и их применение для анализа частично структурированных данных ускоряется во всем мире. В этом докладе будет рассказано о SCM, системе для конфигурации, настройки, и управления Hadoop/HBase и Activity Monitor, системе для мониторинга ряда ОС и Hadoop/HBase метрик, а также об особенностях подхода Cloudera в отличие от существующих решений для мониторинга (Tivoli, xCat, Ganglia, Nagios и т.д.).
Hadoop makes data storage and processing at scale available as a lower cost and open solution. If you ever wanted to get your feet wet but found the elephant intimidating fear no more.
We will explore several integration considerations from a Windows application prospective like accessing HDFS content, writing streaming jobs, using .NET SDK, as well as HDInsight on premise or on Azure.
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Dataconomy Media
What is Big Data? What is Hadoop? What is MapReduce? How do the other components such as: Oozie, Hue, Hive, Impala works? Which are the main Hadoop distributions? What is Spark? What are the differences between Batch and Streaming processing? What are some Business Intelligence Solutions by focusing on some business cases?
Hadoop is emerging as the preferred solution for big data analytics across unstructured data. Using real world examples learn how to achieve a competitive advantage by finding effective ways of analyzing new sources of unstructured and machine-generated data.
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
"Amr Awadallah served as the VP of Engineering of Yahoo's Product
Intelligence Engineering (PIE) team for a number of years. The PIE
team was responsible for business intelligence and advanced data
analytics across a number of Yahoo's key consumer facing properties (search, mail, news, finance, sports, etc). Amr will share the data architecture that PIE had implementted before Hadoop was deployed and the headaches that architecture entailed. Amr will then show how most, if not all of these headaches were eliminated once Hadoop was deployed. Amr will illustrate how Hadoop and Relational Database complement each other within the traditional business intelligence data stack, and how that enables organizations to access all their data under different
operational and economic constraints."
Similar to Common and unique use cases for Apache Hadoop (20)
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
2. Agenda
• What
is
Apache
Hadoop?
• Log
Processing
• Catching
`Osama’
• Extract
Transform
Load
(ETL)
• AnalyBcs
in
HBase
• Machine
Learning
• Final
Thoughts
Copyright
2011
Cloudera
Inc.
All
rights
reserved
3. Exploding
Data
Volumes
• Online
• Web-‐ready
devices
• Social
media
Complex, Unstructured
• Digital
content
• Smart
grids
• Enterprise
Relational
• TransacBons
• R&D
data
• OperaBonal
(control)
data
Digital
universe
grew
by
62%
last
year
to
2,500
exabytes
of
new
informaBon
in
800K
petabytes
and
will
grow
to
1.2
2012
with
Internet
as
primary
driver
“zeabytes”
this
year
Source:
An
IDC
White
Paper
-‐
sponsored
by
EMC.
As
the
Economy
Contracts,
the
Digital
Universe
Expands.
May
2009
Copyright
2011
Cloudera
Inc.
All
rights
reserved
4. Origin
of
Hadoop
How
does
an
elephant
sneak
up
on
you?
Hadoop
wins
Terabyte
sort
benchmark
Releases
Open
Source,
CDH3
and
Publishes
MapReduce
Cloudera
MapReduce,
&
HDFS
Runs
4,000
Enterprise
Open
Source,
GFS
Paper
project
Node
Hadoop
Web
Crawler
created
by
Cluster
project
Launches
SQL
Doug
Cucng
created
by
Support
for
Doug
Cucng
Hadoop
2002
2003
2004
2005
2006
2007
2008
2009
2010
Copyright
2011
Cloudera
Inc.
All
rights
reserved
5. What
is
Apache
Hadoop?
Open
Source
Storage
and
Processing
Engine
•
Consolidates
Everything
•
Move
complex
and
relaBonal
data
into
a
single
repository
•
Stores
Inexpensively
•
Keep
raw
data
always
available
MapReduce
•
Use
commodity
hardware
•
Processes
at
the
Source
•
Eliminate
ETL
bolenecks
Hadoop
Distributed
•
Mine
data
first,
govern
later
File
System
(HDFS)
Copyright
2011
Cloudera
Inc.
All
rights
reserved
6. What
is
Apache
Hadoop?
The
Standard
Way
Big
Data
Gets
Done
• Hadoop
is
Flexible:
• Structured,
unstructured
• Schema,
no
schema
• High
volume,
merely
terabytes
• All
kinds
of
analyBc
applicaBons
• Hadoop
is
Open:
100%
Apache-‐licensed
open
source
• Hadoop
is
Scalable:
Proven
at
petabyte
scale
• Benefits:
• Controls
costs
by
storing
data
more
affordably
per
terabyte
than
any
other
plalorm
• Drives
revenue
by
extracBng
value
from
data
that
was
previously
out
of
reach
Copyright
2011
Cloudera
Inc.
All
rights
reserved
7. What
is
Apache
Hadoop?
The
Importance
of
Being
Open
No
Lock-‐In
-‐
Investments
in
skills,
services
&
hardware
are
preserved
regardless
of
vendor
choice
Community
Development
-‐
Hadoop
&
related
projects
are
expanding
at
a
rapid
pace
Rich
Ecosystem
-‐
Dozens
of
complementary
somware,
hardware
and
services
firms
Copyright
2011
Cloudera
Inc.
All
rights
reserved
8. Agenda
• What
is
Apache
Hadoop?
• Log
Processing
• Catching
`Osama’
• Extract
Transform
Load
(ETL)
• AnalyBcs
in
HBase
• Machine
Learning
• Final
Thoughts
Copyright
2011
Cloudera
Inc.
All
rights
reserved
9. Log
Processing
A
Perfect
Fit
• Common
uses
of
logs
• Find
or
count
events
(grep)
grep
“ERROR”
file
grep
-‐c
“ERROR”
file
• Calculate
metrics
(performance
or
user
behavior
analysis)
awk
‘{sums[$1]+=$2;
counts[$1]+=1}
END
{for(k
in
counts)
{print
sums[k]/counts
[k]}}’
• InvesBgate
user
sessions
grep
“USER”
files
…
|
sort
|
less
10. Log
Processing
A
Perfect
Fit
• Shoot…too
much
data
• Homegrown
parallel
processing
omen
done
on
per
file
basis,
cause
it’s
easy
• No
parallelism
on
a
single
large
file
Task
0
access_log
Task
1
Task
2
access_log
access_log
11. Log
Processing
A
Perfect
Fit
• MapReduce
to
the
rescue!
• Processing
is
done
per
unit
of
data
Task
0
Task
1
Task
2
Task
3
access_log
0-‐64MB
64-‐128MB
128-‐192MB
192-‐256MB
Each
task
is
responsible
for
a
unit
of
data
12. Log
Processing
A
Perfect
Fit
• Network
or
disk
are
bolenecks
• Reading
100GB
of
data
• 14
minutes
with
1GbE
network
connecBon
• 22
minutes
on
standard
disk
drive
access_log
ited
Bandwidth
is
lim
grep
13. Log
Processing
A
Perfect
Fit
• Hadoop
to
the
rescue!
• Eliminates
network
boleneck,
data
is
on
local
disk
• Data
is
read
from
many,
many
disks
in
parallel
Physical
Machines
NodeA
NodeX
NodeY
NodeZ
Task
0
Task
1
Task
2
Task
3
0-‐64MB
64-‐128MB
128-‐192MB
192-‐256MB
14. Log
Processing
A
Perfect
Fit
• Hadoop
currently
scales
to
4,000
nodes
• Goal
for
next
release
is
10,000
nodes
• Nodes
typically
have
12
hard
drives
• A
single
hard
drive
has
throughput
of
about
75MB/second
• 12
Hard
Drives
*
75
MB/second
*
4000
Nodes
=
3.4
TB/second
• That’s
bytes,
not
bits
• That’s
enough
bandwidth
to
read
1PB
(1000
TB)
in
5
minutes
15. Agenda
• What
is
Apache
Hadoop?
• Log
Processing
• Catching
`Osama’
• Extract
Transform
Load
(ETL)
• AnalyBcs
in
HBase
• Machine
Learning
• Final
Thoughts
Copyright
2011
Cloudera
Inc.
All
rights
reserved
16. Catching
`Osama’
Embarrassingly
Parallel
• You
have
a
few
billion
images
of
faces
with
geo-‐tags
• Tremendous
storage
problem
• Tremendous
processing
problem
• Bandwidth
• CoordinaBon
17. Catching
`Osama’
Embarrassingly
Parallel
• Store
the
images
in
Hadoop
• When
processing,
Hadoop
will
read
the
images
from
local
disk,
thousands
of
local
disks
spread
throughout
the
cluster
• Use
Map
only
job
to
compare
input
images
against
`needle’
image
18. Catching
`Osama’
Embarrassingly
Parallel
Tasks
have
copy
of
`needle’
Map
Task
0
Map
Task
1
Store
images
in
Sequence
Files
Output
faces
`matching’
needle
19. Agenda
• What
is
Apache
Hadoop?
• Log
Processing
• Catching
`Osama’
• Extract
Transform
Load
(ETL)
• AnalyBcs
in
HBase
• Machine
Learning
• Final
Thoughts
Copyright
2011
Cloudera
Inc.
All
rights
reserved
20. Extract
Transform
Load
(ETL)
Everyone
is
doing
it
• One
of
the
most
common
use
cases
I
see
is
replacing
ETL
processes
• Hadoop
is
a
huge
sink
of
cheap
storage
and
processing
• Aggregates
built
in
Hadoop
and
exported
• Apache
Hive
provides
SQL
like
querying
on
raw
data
21. Extract
Transform
Load
(ETL)
Everyone
is
doing
it
`Real’
Time
System
(Website)
Data
Warehouse
Business
Intelligence
ApplicaBons
Online
AnalyBcal
DB
DB
ETL
Much
blood
shed,
here
22. Extract
Transform
Load
(ETL)
Everyone
is
doing
it
`Real’
Time
System
(Website)
Data
Warehouse
Business
Intelligence
ApplicaBons
Online
AnalyBcal
DB
DB
Import Hadoop
Export
23. Extract
Transform
Load
(ETL)
Everyone
is
doing
it
`Real’
Time
System
(Website)
Data
Warehouse
Business
Intelligence
ApplicaBons
Online
AnalyBcal
DB
DB
Apache Hadoop
Sqoop
Apache
Sqoop
24. Agenda
• What
is
Apache
Hadoop?
• Log
Processing
• Catching
`Osama’
• Extract
Transform
Load
(ETL)
• AnalyBcs
in
HBase
• Machine
Learning
• Final
Thoughts
Copyright
2011
Cloudera
Inc.
All
rights
reserved
25. AnalyScs
in
HBase
Scaling
writes
• AnalyBcs
is
omen
simply
counBng
things
• Facebook
chose
HBase
to
store
it’s
massive
counter
infrastructure
(more
later)
• How
might
one
implement
a
counter
infrastructure
in
HBase?
26. AnalyScs
in
HBase
Scaling
writes
User
&
Content
Type
Counters
`Like’
buon
IMG
request
sends
HTTP
request
to
User
Content
Counter
Facebook
servers
which
brock@me.com
NEWS
5431
increments
several
counters
brock@me.com
TECH
79310
brock@me.com
SHOPPING
59
tom@him.com
SPORTS
94214
Individual
Page
Counters
URL
Counter
com.cloudera/blog/…
154
com.cloudera/downloads/…
923621
com.cloudera/resources/…
2138
27. AnalyScs
in
HBase
Scaling
writes
Individual
Page
Counters
Host
is
reversed
in
URL
as
part
of
the
key
URL
Counter
com.cloudera/blog/…
154
com.cloudera/downloads/…
923621
com.cloudera/resources/…
2138
• Data
is
physically
stored
in
sorted
order
• Scanning
all
`com.cloudera’
counters
results
in
sequenBal
I/O
28. Facebook
AnalyScs
Scaling
writes
• Real-‐Bme
counters
of
URLs
shared,
links
“liked”,
impressions
generated
• 20
billion
events/day
(200K
events/sec)
• ~30
second
latency
from
click
to
count
• Heavy
use
of
incrementColumnValue
API
for
consistent
counters
• Tried
MySQL,
Cassandra,
seled
on
HBase
hp://Bny.cloudera.com/hbase-‐„-‐analyBcs
29. Agenda
• What
is
Apache
Hadoop?
• Log
Processing
• Catching
`Osama’
• Extract
Transform
Load
(ETL)
• AnalyBcs
in
HBase
• Machine
Learning
• Final
Thoughts
Copyright
2011
Cloudera
Inc.
All
rights
reserved
33. Machine
Learning
Apache
Mahout
• Apache
Mahout
implements
• CollaboraBve
Filtering
• ClassificaBon
• Clustering
• Frequent
itemset
• More
coming
with
the
integraBon
of
MapReduce.Next
34. Agenda
• What
is
Apache
Hadoop?
• Log
Processing
• Catching
`Osama’
• Extract
Transform
Load
(ETL)
• AnalyBcs
in
HBase
• Machine
Learning
• Final
Thoughts
Copyright
2011
Cloudera
Inc.
All
rights
reserved
35. Final
Thoughts
Use
the
right
tool
• Other
use
cases
• OpenTSDB
an
open
distributed,
scalable
Time
Series
Database
(TSDB)
• Building
Search
Indexes
(canonical
use
case)
• Facebook
Messaging
• Cheap
and
Deep
Storage,
e.g.
archiving
emails
for
SOX
compliance
• Audit
Logging
• Non-‐Use
Cases
• Data
processing
is
handled
by
one
beefy
server
• Data
requires
transacBons
36. About
the
Presenter
• Brock
Noland
• brock@cloudera.com
• hp://twier.com/brocknoland
• TC-‐HUG
hp://tch.ug