Pig is a platform for analyzing large datasets that sits between low-level MapReduce programming and high-level SQL queries. It provides a language called Pig Latin that allows users to specify data analysis programs without dealing with low-level details. Pig Latin scripts are compiled into sequences of MapReduce jobs for execution. HCatalog allows data to be shared between Pig, Hive, and other tools by reading metadata about schemas, locations, and formats.
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
This Hadoop Pig tutorial will unravel Pig Programming, Pig Commands, Pig Fundamentals, Grunt Mode, Script Mode & Embedded Mode.
At the end, you'll have a strong knowledge regarding Hadoop Pig Basics.
PPT Agenda:
✓ Introduction to BIG Data & Hadoop
✓ What is Pig?
✓ Pig Data Flows
✓ Pig Programming
----------
What is Pig?
Pig is an open source data flow language which processes data management operations via simple scripts using Pig Latin. Pig works very closely in relation with MapReduce.
----------
Applications of Pig
1. Data Cleansing
2. Data Transfers via HDFS
3. Data Factory Operations
4. Predictive Modelling
5. Business Intelligence
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
This Hadoop Pig tutorial will unravel Pig Programming, Pig Commands, Pig Fundamentals, Grunt Mode, Script Mode & Embedded Mode.
At the end, you'll have a strong knowledge regarding Hadoop Pig Basics.
PPT Agenda:
✓ Introduction to BIG Data & Hadoop
✓ What is Pig?
✓ Pig Data Flows
✓ Pig Programming
----------
What is Pig?
Pig is an open source data flow language which processes data management operations via simple scripts using Pig Latin. Pig works very closely in relation with MapReduce.
----------
Applications of Pig
1. Data Cleansing
2. Data Transfers via HDFS
3. Data Factory Operations
4. Predictive Modelling
5. Business Intelligence
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
Jim Scott, CHUG co-founder and Director, Enterprise Strategy and Architecture for MapR presents "Using Apache Drill". This presentation was given on August 13th, 2014 at the Nokia office in Chicago, IL.
Jim has held positions running Operations, Engineering, Architecture and QA teams. He has worked in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. His work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.
Apache Drill brings the power of standard ANSI:SQL 2003 to your desktop and your clusters. It is like AWK for Hadoop. Drill supports querying schemaless systems like HBase, Cassandra and MongoDB. Use standard JDBC and ODBC APIs to use Drill from your custom applications. Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization. This presentation contains live demonstrations.
The video can be found here: http://vimeo.com/chug/using-apache-drill
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
SQL is one of the most widely used languages to access, analyze, and manipulate structured data. As Hadoop gains traction within enterprise data architectures across industries, the need for SQL for both structured and loosely-structured data on Hadoop is growing rapidly Apache Drill started off with the audacious goal of delivering consistent, millisecond ANSI SQL query capability across wide range of data formats. At a high level, this translates to two key requirements – Schema Flexibility and Performance. This session will delve into the architectural details in delivering these two requirements and will share with the audience the nuances and pitfalls we ran into while developing Apache Drill.
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
We share our slides about Apache Tez delivered as a lightening talk given at Warsaw Hadoop User Group http://www.meetup.com/warsaw-hug/events/218579675
Forrester predicts, CIOs who are late to the Hadoop game will finally make the platform a priority in 2015. Hadoop has evolved as a must-to-know technology and has been a reason for better career, salary and job opportunities for many professionals.
Pig programming is more fun: New features in Pigdaijy
In the last year, we add lots of new language features into Pig. Pig programing is much more easier than before. With Pig Macro, we can write functions for Pig and we can modularize Pig program. Pig embedding allow use to embed Pig statement into Python and make use of rich language features of Python such as loop and branch. Java is no longer the only choice to write Pig UDF, we can write UDF in Python, Javascript and Ruby. Nested foreach and cross gives us more ways to manipulate data, which is not possible before. We also add tons of syntax sugar to simplify the Pig syntax. For example, direct syntax support for map, tuple and bag, project range expression in foreach, etc. We also revive the support for illustrate command to ease the debugging. In this paper, I will give an overview of all these features and illustrate how to use these features to program more efficiently in Pig. I will also give concrete example to demonstrate how Pig language evolves overtime with these language improvements.
Talk at Hug FR on December 4, 2012 about the new Apache Drill project. Notably, this talk includes an introduction to the converging specification for the logical plan in Drill.
Strata NY 2016: The future of column-oriented data processing with Arrow and ...Julien Le Dem
In pursuit of speed, big data is evolving toward columnar execution. The solid foundation laid by Arrow and Parquet for a shared columnar representation across the ecosystem promises a great future. Julien Le Dem and Jacques Nadeau discuss the future of columnar and the hardware trends it takes advantage of, like RDMA, SSDs, and nonvolatile memory.
HCatalog is a table abstraction and a storage abstraction system that makes it easy for multiple tools to interact with the same underlying data. A common buzzword in the NoSQL world today is that of polyglot persistence. Basically, what that comes down to is that you pick the right tool for the job. In the hadoop ecosystem, you have many tools that might be used for data processing - you might use pig or hive, or your own custom mapreduce program, or that shiny new GUI-based tool that's just come out. And which one to use might depend on the user, or on the type of query you're interested in, or the type of job we want to run. From another perspective, you might want to store your data in columnar storage for efficient storage and retrieval for particular query types, or in text so that users can write data producers in scripting languages like perl or python, or you may want to hook up that hbase table as a data source. As a end-user, I want to use whatever data processing tool is available to me. As a data designer, I want to optimize how data is stored. As a cluster manager / data architect, I want the ability to share pieces of information across the board, and move data back and forth fluidly. HCatalog's hopes and promises are the realization of all of the above.
Jim Scott, CHUG co-founder and Director, Enterprise Strategy and Architecture for MapR presents "Using Apache Drill". This presentation was given on August 13th, 2014 at the Nokia office in Chicago, IL.
Jim has held positions running Operations, Engineering, Architecture and QA teams. He has worked in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. His work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.
Apache Drill brings the power of standard ANSI:SQL 2003 to your desktop and your clusters. It is like AWK for Hadoop. Drill supports querying schemaless systems like HBase, Cassandra and MongoDB. Use standard JDBC and ODBC APIs to use Drill from your custom applications. Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization. This presentation contains live demonstrations.
The video can be found here: http://vimeo.com/chug/using-apache-drill
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
SQL is one of the most widely used languages to access, analyze, and manipulate structured data. As Hadoop gains traction within enterprise data architectures across industries, the need for SQL for both structured and loosely-structured data on Hadoop is growing rapidly Apache Drill started off with the audacious goal of delivering consistent, millisecond ANSI SQL query capability across wide range of data formats. At a high level, this translates to two key requirements – Schema Flexibility and Performance. This session will delve into the architectural details in delivering these two requirements and will share with the audience the nuances and pitfalls we ran into while developing Apache Drill.
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
We share our slides about Apache Tez delivered as a lightening talk given at Warsaw Hadoop User Group http://www.meetup.com/warsaw-hug/events/218579675
Forrester predicts, CIOs who are late to the Hadoop game will finally make the platform a priority in 2015. Hadoop has evolved as a must-to-know technology and has been a reason for better career, salary and job opportunities for many professionals.
Pig programming is more fun: New features in Pigdaijy
In the last year, we add lots of new language features into Pig. Pig programing is much more easier than before. With Pig Macro, we can write functions for Pig and we can modularize Pig program. Pig embedding allow use to embed Pig statement into Python and make use of rich language features of Python such as loop and branch. Java is no longer the only choice to write Pig UDF, we can write UDF in Python, Javascript and Ruby. Nested foreach and cross gives us more ways to manipulate data, which is not possible before. We also add tons of syntax sugar to simplify the Pig syntax. For example, direct syntax support for map, tuple and bag, project range expression in foreach, etc. We also revive the support for illustrate command to ease the debugging. In this paper, I will give an overview of all these features and illustrate how to use these features to program more efficiently in Pig. I will also give concrete example to demonstrate how Pig language evolves overtime with these language improvements.
Talk at Hug FR on December 4, 2012 about the new Apache Drill project. Notably, this talk includes an introduction to the converging specification for the logical plan in Drill.
Strata NY 2016: The future of column-oriented data processing with Arrow and ...Julien Le Dem
In pursuit of speed, big data is evolving toward columnar execution. The solid foundation laid by Arrow and Parquet for a shared columnar representation across the ecosystem promises a great future. Julien Le Dem and Jacques Nadeau discuss the future of columnar and the hardware trends it takes advantage of, like RDMA, SSDs, and nonvolatile memory.
HCatalog is a table abstraction and a storage abstraction system that makes it easy for multiple tools to interact with the same underlying data. A common buzzword in the NoSQL world today is that of polyglot persistence. Basically, what that comes down to is that you pick the right tool for the job. In the hadoop ecosystem, you have many tools that might be used for data processing - you might use pig or hive, or your own custom mapreduce program, or that shiny new GUI-based tool that's just come out. And which one to use might depend on the user, or on the type of query you're interested in, or the type of job we want to run. From another perspective, you might want to store your data in columnar storage for efficient storage and retrieval for particular query types, or in text so that users can write data producers in scripting languages like perl or python, or you may want to hook up that hbase table as a data source. As a end-user, I want to use whatever data processing tool is available to me. As a data designer, I want to optimize how data is stored. As a cluster manager / data architect, I want the ability to share pieces of information across the board, and move data back and forth fluidly. HCatalog's hopes and promises are the realization of all of the above.
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
ETL is the first phase when building a big data processing platform. Data is available from various sources and formats, and transforming the data into a compact binary format (Parquet, ORC, etc.) allows Apache Spark to process it in the most efficient manner. This talk will discuss common issues and best practices for speeding up your ETL workflows, handling dirty data, and debugging tips for identifying errors.
Speakers: Kyle Pistor & Miklos Christine
This talk was originally presented at Spark Summit East 2017.
Leveraging Hadoop in Heterogeneous environments - I will share our experience in leveraging the power of Hadoop to reach multiple business goals. The talk will also focus on the tools that help in addressing concerns related to polyglot architectures such as interoperability, multi-tenancy, schema evolution and standardization. I will also talk about some frameworks and packages that help in codifying best patterns and practices in integrating Hadoop with other systems such as traditional Business Intelligence systems, Web Analytics and other distributed computing technologies like Apache Spark
Apache Hive provides SQL-like access to your stored data in Apache Hadoop. Apache HBase stores tabular data in Hadoop and supports update operations. The combination of these two capabilities is often desired, however, the current integration show limitations such as performance issues. In this talk, Enis Soztutar will present an overview of Hive and HBase and discuss new updates/improvements from the community on the integration of these two projects. Various techniques used to reduce data exchange and improve efficiency will also be provided.
Ever wonder what Hadoop might look like in 12 months or 24 months or longer? Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. As a result, Hadoop looks very different from itself 12 months ago. This talk will take you through some ideas for YARN itself and the many myriad ways it is really moving the needle for MapReduce, Pig, Hive, Cascading and other data-processing tools in the Hadoop ecosystem.
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
Apache Tez is a library to build data processing engines in Hadoop/YARN. It takes care of many common building blocks like scheduling, fault tolerance, speculation, security etc. so that the engine can focus on its core features. E.g. Apache Hive can focus on SQL optimization. There has been rapid adoption in projects like Hive, Pig, Flink, Cascading, Scalding and commercial products like Datameer and Syncsort. We will provide a brief overview of Tez and then look at new features for job monitoring in the Tez UI and performance debugging tools for Tez applications. Finally we will explore upcoming features like hybrid scheduling that open up new areas of performance and functionality.
Pivotal HD and Spring for Apache Hadoopmarklpollack
In this webinar we introduce the the concepts of Hadoop and dive into some details unqiue to the Pivotal HD distribution, namely HAWQ which brings ANSI complaint SQL to Hadoop.
We also introduce the Spring for Apache Hadoop project that simplifies developing Hadoop applications by providing a unified configuration model and easy to use APIs for using HDFS, MapReduce, Pig, Hive, and HBase. It also provides integration with other Spring ecosystem project such as Spring Integration and Spring Batch enabling you to develop solutions for big data ingest/export and Hadoop workflow orchestration. The new Spring XD umbrella project is also introduced.
"Analyzing Twitter Data with Hadoop - Live Demo", presented at Oracle Open World 2014. The repository for the slides is in https://github.com/cloudera/cdh-twitter-example
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
A walk-thru of core Hadoop, the ecosystem tools, and Hortonworks Data Platform (HDP) followed by code examples in MapReduce (Java and C#), Pig, and Hive.
Presented at the Atlanta .NET User Group meeting in July 2014.
To transform your organization and unlock the value of your data, you need a way to ingest, store and analyze every type of data in your organization.
This presentation covers the Data Access Layer of the Hadoop Ecosystem which enables you to achieve this.
We will use the HDP (Hortonworks Data Platform) reference architecture to walk through the Hadoop core and its ecosystem with focus on the data access layer.
We will cover some of the prominent tools of the ecosystem such as Pig, Hive, Sqoop, Flume and Oozie and how they are used for ingesting data into Hadoop from structured, unstructured and streaming sources.
Talk to us at +91 80 6567 9700 or send an email to training@springpeople.com for more information.
Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology. Cloudera says that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects.
https://www.pass4sureexam.com/ccD-410.html
Process and Visualize Your Data with Revolution R, Hadoop and GoogleVisHortonworks
In this session, attendees will learn how to use R in the distributed environment of Hadoop using the rmr package. Additionally, the R package googleVis will be used to show how application development teams can incorporate the power of R and the power of Google Chart Tools into their applications quickly and easily. The result is a rich custom data visualization with far less coding than what would otherwise be required. The session will begin by discussing R basics and then moving to concrete examples of statistical analysis on data sets. This will be accompanied by an application development example showing custom visualization of the analysis using googleVis. The application development example will show a browser based app both kicking off the data set analysis using R as well as the visualization of the result. Visualization examples will use both googleVis as well as basic Google Chart Tools. Attendees will leave the session with a concrete example of how to incorporate R into their existing application development practices and how to use Hadoop and its ecosystem to build custom visualizations.
Generic presentation about Big Data Architecture/Components. This presentation was delivered by David Pilato and Tugdual Grall during JUG Summer Camp 2015 in La Rochelle, France
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
In the crowded SQL-on-Hadoop market, choosing the right solution for your business can be difficult. In this webinar, learn firsthand from Rick van der Lans, independent analyst and managing director of R20/Consultancy, how to sort through this market complexity and what tough questions to ask when evaluating perspective SQL-on-Hadoop solutions.
It’s no longer a world of just relational databases. Companies are increasingly adopting specialized datastores such as Hadoop, HBase, MongoDB, Elasticsearch, Solr and S3. Apache Drill, an open source, in-memory, columnar SQL execution engine, enables interactive SQL queries against more datastores.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
3. Pig History
• Born from Yahoo Research then Apache incubated
• Built to avoid low level programming of Map/Reduce
without Hive/SQL queries
• Committers from: Yahoo, Hortonworks, LinkedIn,
SalesForce, IBM, Twitter, Netflix, and others
• Alan Gates on Pig
Page 3
4. Pig
• An engine for executing programs on top of
Hadoop
• It provides a language, Pig Latin, to specify these
programs
Page 4
5. HDP: Enterprise Hadoop Platform
Page 5
Hortonworks
Data Platform (HDP)
• The ONLY 100% open source
and complete platform
• Integrates full range of
enterprise-ready services
• Certified and tested at scale
• Engineered for deep
ecosystem interoperability
OS/VM
Cloud
Appliance
PLATFORM
SERVICES
HADOOP
CORE
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots
HORTONWORKS
DATA
PLATFORM
(HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD
&
EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN
MAP
TEZ
REDUCE
HIVE
&
HCATALOG
PIG
HBASE
6. Why use Pig?
• Suppose you have user data in one file, website data
in another, and you need to find the top 5 most visited
sites by users aged 18 - 25
Page 6
8. In Pig Latin
Users
=
load
‘input/users’
using
PigStorage(‘,’)
as
(name:chararray,
age:int);
Fltrd
=
filter
Users
by
age
>=
18
and
age
<=
25;
Pages
=
load
‘input/pages’
using
PigStorage(‘,’)
as
(user:chararray,
url:chararray);
Jnd
=
join
Fltrd
by
name,
Pages
by
user;
Grpd
=
group
Jnd
by
url;
Smmd
=
foreach
Grpd
generate
group,COUNT(Jnd)
as
clicks;
Srtd
=
order
Smmd
by
clicks
desc;
Top5
=
limit
Srtd
5;
store
Top5
into
‘output/top5sites’
using
PigStorage(‘,’);
Page 8
9 lines of code, 15 minutes to write
170 lines to 9 lines of code
9. Essence of Pig
• Map-Reduce is too low a level, SQL too high
• Pig-Latin, a language intended to sit between the two
– Provides standard relational transforms (join, sort, etc.)
– Schemas are optional, used when available, can be defined at
runtime
– User Defined Functions are first class citizens
Page 9
11. Pig Elements
Page 11
• High-level scripting language
• Requires no metadata or schema
• Statements translated into a series of
MapReduce jobs
Pig Latin
• Interactive shellGrunt
• Shared repository for User Defined
Functions (UDFs)Piggybank
12. Pig Latin Data Flow
Page 12
LOAD
(HDFS/HCat)
TRANSFORM
(Pig)
DUMP or
STORE
(HDFS/HCAT)
Read data to be
manipulated from the
file system
Manipulate the
data
Output data to the
screen or store for
processing
In code:
• VARIABLE1
=
LOAD
[somedata]
• VARIABLE2
=
[TRANSFORM
operation]
• STORE
VARIABLE2
INTO
‘[some
location]’
13. Pig Relations
1. A bag is a collection of
unordered tuples
(can be different sizes).
2. A tuple is an ordered
set of fields.
3. A field is a piece of data.
Pig Latin statements work with relations
Field
Field 1
Field
2
Field 3
Tuple
Bag
14. FILTER, GROUP, FOREACH, ORDER
Page 14
logevents
=
LOAD
'input/my.log'
AS
(date:chararray,
level:chararray,
code:int,
message:chararray);
severe
=
FILTER
logevents
BY
(level
==
'severe’
AND
code
>=
500);
grouped
=
GROUP
severe
BY
code;
e1
=
LOAD
'pig/input/File1'
USING
PigStorage(',')
AS
(name:chararray,age:int,
zip:int,salary:double);
f
=
FOREACH
e1
GENERATE
age,
salary;
g
=
ORDER
f
BY
age
15. JOIN, GROUP, LIMIT
Page 15
employees
=
LOAD
‘[somefile]’
AS
(name:chararray,age:int,
zip:int,salary:double);
agegroup
=
GROUP
employees
BY
age;
h
=
LIMIT
agegroup
100;
e1
=
LOAD
’[somefile]'
USING
PigStorage(',')
AS
(name:chararray,
age:int,
zip:int,
salary:double);
e2
=
LOAD
'[somefile]'
USING
PigStorage(',')
AS
(name:chararray,
phone:chararray);
e3
=
JOIN
e1
BY
name,
e2
BY
name;
18. Hive vs Pig
Page 18
Pig and Hive work well together
and many businesses use both.
Hive is a good choice:
• when you want to query the data
• when you need an answer to specific
questions
• if you are familiar with SQL
Pig is a good choice:
• for ETL (Extract -> Transform -> Load)
• for preparing data for easier analysis
• when you have a long series of steps to
perform
20. T-SQL vs Hadoop Ecosystem
Page 20
T-SQL PIG Hive
Query Data Yes Yes (in bulk) Yes
Local Variables Yes Yes No
Conditional Logic Yes limited limited
Procedural
Programming
Yes No No
UDFs No Yes Yes
24. Tools With HCatalog
Page 24
Feature MapReduce +
HCatalog
Pig + HCatalog Hive
Record format Record Tuple Record
Data model int, float, string,
maps, structs, lists
int, float, string,
bytes, maps,
tuples, bags
int, float, string,
maps, structs, lists
Schema Read from
metadata
Read from
metadata
Read from
metadata
Data location Read from
metadata
Read from
metadata
Read from
metadata
Data format Read from
metadata
Read from
metadata
Read from
metadata
• Pig/MR users can read schema from metadata
• Pig/MR users are insulated from schema, location, and format changes
• All users have access to other users’ data as soon as it is committed
26. Data & Metadata REST Services APIs
Page 26
HDFS HBase
External
Store
Existing & New Applications
MapReduce Pig Hive
HCatalog
WebHCat RESTful Web Services
WebHDFS & WebHCat
provide RESTful API as
“front door” for Hadoop
• Opens the door to
languages other than Java
• Thin clients via web
services vs. fat-clients in
gateway
• Insulation from interface
changes release to release
Opens Hadoop to integration with existing and new applications
WebHDFS
27. RESTful API Access for Pig
• Code example
curl
-‐s
-‐d
user.name=hue
-‐d
execute=”<pig
script>”
'http://localhost:50111/templeton/v1/pig'
• RestSharp (restsharp.org/)
– Simple REST and HTTP API Client
for .NET
Page 27
30. Hive – MR Hive – Tez
Hive-on-MR vs. Hive-on-Tez
Page 30
SELECT a.state, COUNT(*), AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a.itemId = c.itemId)
GROUP BY a.state
SELECT a.state
JOIN (a, c)
SELECT c.price
SELECT b.id
JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
M M
R
R
SELECT a.state,
c.itemId
JOIN (a, c)
JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)
SELECT b.id
Tez avoids
unneeded writes to
HDFS
31. Pig on Tez - Design
3
Logical Plan
Tez Plan MR Plan
Physical Plan
Tez Execution Engine MR Execution Engine
LogToPhyTranslationVisitor
MRCompilerTezCompiler
33. User Defined Functions
• Ultimate in extensibility and portability
• Custom processing
– Java
– Python
– JavaScript
– Ruby
• Integration with MapReduce phases
– Map
– Combine
– Reduce
34. User Defined Functions
public class MyUDF extends EvalFunc<DataBag>
implements Algebraic {!
…!
}!
• Algebraic functions
• 3-phase execution
– Map – called once for each tuple
– Combiner – called zero or more times for each map result
– Reduce
35. User Defined Functions
public class MyUDF extends EvalFunc<DataBag>
implements Accumulator {!
…!
}!
• Accmulator functions
• Incremental processing of data
• Called in both map and reduce phase
36. User Defined Functions
public class MyUDF extends FilterFunc {!
…!
}!
• Filter functions
• Returns boolean based on processing of the tuple
• Called in both map and reduce phase