This document describes Yahoo's attribution framework for efficiently performing sparse joins on massive data to attribute display ad impressions and clicks to the serving events. The key challenges are the size disparity between the impression/click events (hundreds of MBs) and serving events (multiple TBs), and the sparse nature of the joins. The framework uses aggressive data partitioning and pruning strategies, and partition-aware efficient join query plans, to enable attributing events over long lookback windows efficiently. Performance comparisons show it significantly outperforms alternative approaches like hash and replicated joins in Pig.
Introduction to Reliable Business Case, the reliable method for investments in business improvement. the free RBC Excel Tool can be downloaded from http://www.reliablechange.eu and offer Return on Investment(ROI), benefit-Cost Ratio(BCR), cash flow, payback, Internal Rate of Return(IRR), benefits analysis, net present value (NPV) and more. The methodology increase the likeliness the benefits will occur in reality. Quick and easy business case method. The RBC Excel Tool can compare up to six alternatives, offer documentation of benefits, formulas, dependencies, risk, and more.
This document is a working paper that examines the interaction between paid search advertising and display advertising. It aims to address whether display ads influence paid search and vice versa, the size of these effects, and their dynamic patterns over time. The paper also discusses the implications for online marketing metrics and optimal budget allocation when accounting for attribution of effects between the different channels and their dynamics. It was written by Pavel Kireyev, Koen Pauwels, and Sunil Gupta, with the goal of better understanding how attribution and dynamics impact the effectiveness of online advertising.
The value of being seen : a guide to digital advertising viewability - Turn -...Romain Fonnier
What’s hard to understand about advertisers wanting ads to be seen by real live consumers?
At first blush, it seems so simple: viewable ads reassure advertisers that their ads are delivered in-view (that is, on an active page or within frame). Yet, viewability does not guarantee that the end consumer will actually view the ad creative or engage with the message.
Successful digital advertising depends on a variety of factors -- including a nuanced understanding of what viewability is, when it matters, and how it fits into a broader campaign strategy.
The document outlines the key players and technologies involved in digital advertising and marketing. It shows the flow of data, ads, and transactions between agencies, advertisers, publishers, ad servers, data management platforms, demand side platforms, supply side platforms, exchanges, and various tools for measurement, analytics, creative optimization, and audience targeting. All of these work together through sharing data and technologies to plan, buy, optimize, and measure digital advertising campaigns across many online and mobile channels and devices.
BrandsLab Media Value Session 4 | Verification - Viewability, Fraud & Brand S...Ebiquity-NA
Ad fraud and viewability issues continue to plague the marketplace if left unmanaged, making ad verification vital to your brand dollars. In this session, out digital experts will go into the different steps you can take to protect your brand and also delve into how these verification issues intersect with programmatic.
This document discusses digital advertising fraud, including the types of fraud, who participates in it, how it works, and how it can be detected and prevented. Some key points:
- Fraud costs the digital advertising industry billions annually through fraudulent impressions and clicks. Various types of fraud include bot traffic, pixel stuffing, and ad stacking.
- Participants include hackers who create botnets, botnet operators based in Eastern Europe, and infected computer owners who are compromised without their knowledge.
- Fraud works by infecting computers with malware that creates bots controlled by botnets. The bots are instructed to generate fraudulent traffic and clicks.
- Detection examines behavioral patterns and signals at the impression level to identify
Multi Channel Attribution - Driving Marketing Spend Planning In The Big Data AgeAbsolutdata Analytics
This presentation was given by Eli Kling, Director - Analytics, AbsolutData at The Business Analytics Conference, AmsterDam, October 2013.
AbsolutData is a global leader in applying analytics to drive sales and increase profits for its customers. AbsolutData has built strong expertise and traction with Fortune 1000 companies across 40 countries. We specialize in big data, high end business analytics, predictive modeling, research, reporting, social media analytics and data management services. AbsolutData delivers world class analytics solutions by combining their expertise in industry domains, analytical techniques and sophisticated tools
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS's master-slave architecture with a single NameNode master and multiple DataNode slaves. The NameNode manages filesystem metadata and data placement, while DataNodes store data blocks. The document outlines HDFS components like the SecondaryNameNode, DataNodes, and how files are written and read. It also discusses high availability solutions, operational tools, and the future of HDFS.
Introduction to Reliable Business Case, the reliable method for investments in business improvement. the free RBC Excel Tool can be downloaded from http://www.reliablechange.eu and offer Return on Investment(ROI), benefit-Cost Ratio(BCR), cash flow, payback, Internal Rate of Return(IRR), benefits analysis, net present value (NPV) and more. The methodology increase the likeliness the benefits will occur in reality. Quick and easy business case method. The RBC Excel Tool can compare up to six alternatives, offer documentation of benefits, formulas, dependencies, risk, and more.
This document is a working paper that examines the interaction between paid search advertising and display advertising. It aims to address whether display ads influence paid search and vice versa, the size of these effects, and their dynamic patterns over time. The paper also discusses the implications for online marketing metrics and optimal budget allocation when accounting for attribution of effects between the different channels and their dynamics. It was written by Pavel Kireyev, Koen Pauwels, and Sunil Gupta, with the goal of better understanding how attribution and dynamics impact the effectiveness of online advertising.
The value of being seen : a guide to digital advertising viewability - Turn -...Romain Fonnier
What’s hard to understand about advertisers wanting ads to be seen by real live consumers?
At first blush, it seems so simple: viewable ads reassure advertisers that their ads are delivered in-view (that is, on an active page or within frame). Yet, viewability does not guarantee that the end consumer will actually view the ad creative or engage with the message.
Successful digital advertising depends on a variety of factors -- including a nuanced understanding of what viewability is, when it matters, and how it fits into a broader campaign strategy.
The document outlines the key players and technologies involved in digital advertising and marketing. It shows the flow of data, ads, and transactions between agencies, advertisers, publishers, ad servers, data management platforms, demand side platforms, supply side platforms, exchanges, and various tools for measurement, analytics, creative optimization, and audience targeting. All of these work together through sharing data and technologies to plan, buy, optimize, and measure digital advertising campaigns across many online and mobile channels and devices.
BrandsLab Media Value Session 4 | Verification - Viewability, Fraud & Brand S...Ebiquity-NA
Ad fraud and viewability issues continue to plague the marketplace if left unmanaged, making ad verification vital to your brand dollars. In this session, out digital experts will go into the different steps you can take to protect your brand and also delve into how these verification issues intersect with programmatic.
This document discusses digital advertising fraud, including the types of fraud, who participates in it, how it works, and how it can be detected and prevented. Some key points:
- Fraud costs the digital advertising industry billions annually through fraudulent impressions and clicks. Various types of fraud include bot traffic, pixel stuffing, and ad stacking.
- Participants include hackers who create botnets, botnet operators based in Eastern Europe, and infected computer owners who are compromised without their knowledge.
- Fraud works by infecting computers with malware that creates bots controlled by botnets. The bots are instructed to generate fraudulent traffic and clicks.
- Detection examines behavioral patterns and signals at the impression level to identify
Multi Channel Attribution - Driving Marketing Spend Planning In The Big Data AgeAbsolutdata Analytics
This presentation was given by Eli Kling, Director - Analytics, AbsolutData at The Business Analytics Conference, AmsterDam, October 2013.
AbsolutData is a global leader in applying analytics to drive sales and increase profits for its customers. AbsolutData has built strong expertise and traction with Fortune 1000 companies across 40 countries. We specialize in big data, high end business analytics, predictive modeling, research, reporting, social media analytics and data management services. AbsolutData delivers world class analytics solutions by combining their expertise in industry domains, analytical techniques and sophisticated tools
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS's master-slave architecture with a single NameNode master and multiple DataNode slaves. The NameNode manages filesystem metadata and data placement, while DataNodes store data blocks. The document outlines HDFS components like the SecondaryNameNode, DataNodes, and how files are written and read. It also discusses high availability solutions, operational tools, and the future of HDFS.
This document discusses the state of digital ad fraud. It finds that ad fraud is extremely lucrative and scalable, with profit margins of 80-99% for fraudsters. Fraud operations are also massively scalable through techniques like using thousands of fake websites and bots. Digital ad fraud is now one of the largest forms of crime, estimated at $31 billion annually in the US alone. The document examines how fraud harms the digital ad ecosystem and good publishers through stolen ad revenue and bad measurements. It finds current bot and fraud detection capabilities still limited despite the scale of the problem.
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
Hive is a data warehousing system built on Hadoop that allows users to query data using SQL. It addresses issues with using Hadoop for analytics like programmability and metadata. Hive uses a metastore to manage metadata and supports structured data types, SQL queries, and custom MapReduce scripts. At Facebook, Hive is used for analytics tasks like summarization, ad hoc analysis, and data mining on over 180TB of data processed daily across a Hadoop cluster.
Apache Hive provides SQL-like access to your stored data in Apache Hadoop. Apache HBase stores tabular data in Hadoop and supports update operations. The combination of these two capabilities is often desired, however, the current integration show limitations such as performance issues. In this talk, Enis Soztutar will present an overview of Hive and HBase and discuss new updates/improvements from the community on the integration of these two projects. Various techniques used to reduce data exchange and improve efficiency will also be provided.
Chango - DDM Alliance Summit Marketing on FacebookDDM Alliance
This document discusses how marketers can leverage their marketing data across different channels like display, social media, mobile, and video using programmatic advertising. It notes that while big data means a large amount of unstructured data, programmatic marketing can make this data actionable. Specifically, the document highlights that a programmatic advertising platform collects over 20,000 GB of data per day including search queries and page views, and this data can be used to deliver more personalized experiences to customers across channels in real-time.
HDFS is a distributed file system designed for storing very large data files across commodity servers or clusters. It works on a master-slave architecture with one namenode (master) and multiple datanodes (slaves). The namenode manages the file system metadata and regulates client access, while datanodes store and retrieve block data from their local file systems. Files are divided into large blocks which are replicated across datanodes for fault tolerance. The namenode monitors datanodes and replicates blocks if their replication drops below a threshold.
Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Setting up the Hadoop Cluster, Map-Reduce,PIG, HIVE, HBase, Zookeeper, SQOOP etc. will be covered in the course.
Slides from my session at MeasureWorks' Performance Labs... Topic is Observability, the new buzzword in Web Performance & DevOps, trying to explain what it is and why it matters for your operations...
Softtek Break Through Savings No Need Offshore 2011 Asug FinalMauro Okamoto
The document discusses achieving breakthrough savings without offshoring through transforming an organization's application delivery model. It describes a 4 step process: 1) Establishing a governance model focused on business objectives and processes. 2) Mapping processes to identify improvements. 3) Implementing business service level agreements. 4) Revisiting the governance model. The approach focuses on innovation, productivity and business-driven services over short-term offshoring savings. A case study shows transforming an organization to reduce IT spending by 79% while improving performance.
Dreamforce'12 - Automate Business Processes with Force.comMudit Agarwal
Force.com is a powerful platform, and at VMWare we are always looking for new ways leverage the power of the platform. Over time we’ve developed several custom applications on Force.com to automate our business processes and meet our unique business requirements. In this session, we will discuss two such custom applications that we built to solve critical business process automation needs. For each application, we’ll review the use case, benefits and the specific Force.com technologies used to develop the solution.
Samanage Benchmarking: Better Service Performance Starts HereSamanage
Samanage Benchmarking provides real-time performance benchmarks against hundreds of service desks worldwide. It automates data collection, performance reporting, diagnosis, and action planning, and charts your journey to world-class performance. The service utilizes the Samanage service desk product together with MetricNet’s Key Performance Indicators to show customers how they are performing compared to peers both inside and outside of their industry.
5 Best Practices for Successful Cloud Deployments – and the Pitfalls to AvoidCompuware APM
Companies that rely on enforcing Service Level Agreements (SLAs) from their Cloud Service Providers to manage the performance of their cloud applications are increasingly discovering that they are failing to meet their business objectives. The reality is that by the time an SLA has been missed, your end users and customers have already been severely impacted, resulting in poor adoption and missed revenue opportunities.
Successful cloud deployments require real-time visibility of application performance and service level trends across the entire delivery chain – from first mile (your data center) to last mile (your users). This insures that companies are able to detect problems anywhere in the delivery chain, before end users, customers, and your business are impacted.
This webinar will provide evidence of the revenue impact of poor cloud application performance and service level management, and will present an integrated cross-domain services and application performance management strategy that will dramatically improve the business results that you achieve with Cloud Computing. The information will be of use to anyone who is responsible for the successful execution of their company’s cloud strategy, both in the line-of-business and IT.
Dennis Drogseth, Vice President, Enterprise Management Associates, Inc and Richard Stone, Compuware’s Cloud Solutions Manager talk about:
• Research results showing how and where companies are benefitting most, and conversely where they’ve had issues, or had to rethink and redirect their cloud initiatives
• Best practices to adopt – and pitfalls to avoid – that will enable you to get cloud deployments right the first time from both a process and an APM perspective
This document discusses managed services and provides an overview of key topics:
- It defines managed services as outsourcing of core business functions like network, hosted services, and IT management to a third party provider.
- Managed services offer benefits like reducing capital and operational expenditures, improving network performance through vendor expertise, and allowing companies to focus on their core business.
- The document compares managed services to outsourcing, noting that managed services transfer day-to-day management and aim to deliver standardized IT functions as a service.
- Top managed services offered include network management, IT infrastructure management, security management, and unified communications. The needs of telecom operators and expectations from managed service providers are also
Rundeck is an open source automation platform that allows users to define, schedule, and execute jobs across multiple servers to automate system administration tasks. It was started in 2010 and has over 100 contributors. Rundeck provides a web interface, API, and CLI to define workflows and orchestrate tasks. It supports plugins for popular DevOps tools and can integrate with other systems like ITSM tools. Rundeck Enterprise provides additional features like high availability, security controls, and support.
What does performance mean in the cloudMichael Kopp
Performance problems are one of the most cited concerns about to the cloud. But is it really the cloud or the application? What does performance mean anyway when you can scale to thousands of servers? This session will discuss why traditional means of performance management and troubleshooting no longer work and how this affects everything. Most importantly we will look at how to identify the root cause of performance problems in such dynamic environments. Finally we will explain how to assess and manage performance when capacity is no longer the issue.
This document discusses managed services and provides an overview of key topics:
- It defines managed services as outsourcing of core business functions like network, hosted services, and IT. Managed service providers plan, deploy, maintain, and optimize the network.
- Benefits of managed services include reducing CAPEX and OPEX, gaining access to technical expertise, improving network performance and efficiency, and allowing companies to focus on their core business.
- The document compares managed services to outsourcing, noting that managed services transfer day-to-day management and focus on delivering specific IT functions as a service.
- A case study outlines how managed services can help telecom operators address challenges like falling revenues
The document discusses how to build trust in cloud computing. It recommends a four-layer approach: 1) Educate yourself on cloud terms and security measures; 2) Monitor cloud services and infrastructure for issues; 3) Establish processes for training, escalation, and documentation; 4) Practice failover procedures by backing up data and testing backup systems. Following these steps can help address common concerns about lack of control, visibility and reliability in cloud computing.
The survey results from landlords and tenants showed that most respondents were tenants (61%) followed by landlords (25%) and those who were both (14%). Most landlords owned 2-3 rental units and managed their properties themselves rather than using a property manager due to the cost. Their top pain points were finding tenants, maintenance issues, and payments. The easiest tasks for landlords were credit checks, finding tenants through sites like Craigslist, and collecting rent payments. Landlords felt the most important potential features of a rental management site would be tenant reputation histories, integrated communication tools, and one credit check accepted by multiple landlords.
“What the hell is cloud computing?” After a year, those infamous words of Oracle CEO Larry Ellison still resonate. The definition of cloud computing is hazy at best, and many companies remain wary of the technology over concerns about infrastructure, security and regulation.
Cloud computing has unique potential to save the enterprise cost, reduce complexity and provide highly available service to the end-user or client. With such compelling benefits, companies should look to understand cloud better—what it is, what it isn’t and what it will be.
In this webinar, Yankee Group analysts Agatha Poon and Camille Mendler define cloud computing and explore the capabilities and challenges of the technology.
Warranty Outsourcing For Strategic GainsImranMasood
1) Warranty management involves many challenges across claims management, field support, and supplier recovery with issues like fraudulent claims, slow cycle times, and suboptimal utilization of resources.
2) Outsourcing warranty management can provide benefits like a 1/3 reduction in costs through process interventions across the entire warranty value chain.
3) When selecting a partner, companies should look for one that can provide a customized solution using best-in-class tools and processes, focus on the right metrics around both contractual SLAs and business outcomes, provide cross-industry benchmarks and analytics capabilities to drive improvements.
This document discusses the state of digital ad fraud. It finds that ad fraud is extremely lucrative and scalable, with profit margins of 80-99% for fraudsters. Fraud operations are also massively scalable through techniques like using thousands of fake websites and bots. Digital ad fraud is now one of the largest forms of crime, estimated at $31 billion annually in the US alone. The document examines how fraud harms the digital ad ecosystem and good publishers through stolen ad revenue and bad measurements. It finds current bot and fraud detection capabilities still limited despite the scale of the problem.
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
Hive is a data warehousing system built on Hadoop that allows users to query data using SQL. It addresses issues with using Hadoop for analytics like programmability and metadata. Hive uses a metastore to manage metadata and supports structured data types, SQL queries, and custom MapReduce scripts. At Facebook, Hive is used for analytics tasks like summarization, ad hoc analysis, and data mining on over 180TB of data processed daily across a Hadoop cluster.
Apache Hive provides SQL-like access to your stored data in Apache Hadoop. Apache HBase stores tabular data in Hadoop and supports update operations. The combination of these two capabilities is often desired, however, the current integration show limitations such as performance issues. In this talk, Enis Soztutar will present an overview of Hive and HBase and discuss new updates/improvements from the community on the integration of these two projects. Various techniques used to reduce data exchange and improve efficiency will also be provided.
Chango - DDM Alliance Summit Marketing on FacebookDDM Alliance
This document discusses how marketers can leverage their marketing data across different channels like display, social media, mobile, and video using programmatic advertising. It notes that while big data means a large amount of unstructured data, programmatic marketing can make this data actionable. Specifically, the document highlights that a programmatic advertising platform collects over 20,000 GB of data per day including search queries and page views, and this data can be used to deliver more personalized experiences to customers across channels in real-time.
HDFS is a distributed file system designed for storing very large data files across commodity servers or clusters. It works on a master-slave architecture with one namenode (master) and multiple datanodes (slaves). The namenode manages the file system metadata and regulates client access, while datanodes store and retrieve block data from their local file systems. Files are divided into large blocks which are replicated across datanodes for fault tolerance. The namenode monitors datanodes and replicates blocks if their replication drops below a threshold.
Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Setting up the Hadoop Cluster, Map-Reduce,PIG, HIVE, HBase, Zookeeper, SQOOP etc. will be covered in the course.
Slides from my session at MeasureWorks' Performance Labs... Topic is Observability, the new buzzword in Web Performance & DevOps, trying to explain what it is and why it matters for your operations...
Softtek Break Through Savings No Need Offshore 2011 Asug FinalMauro Okamoto
The document discusses achieving breakthrough savings without offshoring through transforming an organization's application delivery model. It describes a 4 step process: 1) Establishing a governance model focused on business objectives and processes. 2) Mapping processes to identify improvements. 3) Implementing business service level agreements. 4) Revisiting the governance model. The approach focuses on innovation, productivity and business-driven services over short-term offshoring savings. A case study shows transforming an organization to reduce IT spending by 79% while improving performance.
Dreamforce'12 - Automate Business Processes with Force.comMudit Agarwal
Force.com is a powerful platform, and at VMWare we are always looking for new ways leverage the power of the platform. Over time we’ve developed several custom applications on Force.com to automate our business processes and meet our unique business requirements. In this session, we will discuss two such custom applications that we built to solve critical business process automation needs. For each application, we’ll review the use case, benefits and the specific Force.com technologies used to develop the solution.
Samanage Benchmarking: Better Service Performance Starts HereSamanage
Samanage Benchmarking provides real-time performance benchmarks against hundreds of service desks worldwide. It automates data collection, performance reporting, diagnosis, and action planning, and charts your journey to world-class performance. The service utilizes the Samanage service desk product together with MetricNet’s Key Performance Indicators to show customers how they are performing compared to peers both inside and outside of their industry.
5 Best Practices for Successful Cloud Deployments – and the Pitfalls to AvoidCompuware APM
Companies that rely on enforcing Service Level Agreements (SLAs) from their Cloud Service Providers to manage the performance of their cloud applications are increasingly discovering that they are failing to meet their business objectives. The reality is that by the time an SLA has been missed, your end users and customers have already been severely impacted, resulting in poor adoption and missed revenue opportunities.
Successful cloud deployments require real-time visibility of application performance and service level trends across the entire delivery chain – from first mile (your data center) to last mile (your users). This insures that companies are able to detect problems anywhere in the delivery chain, before end users, customers, and your business are impacted.
This webinar will provide evidence of the revenue impact of poor cloud application performance and service level management, and will present an integrated cross-domain services and application performance management strategy that will dramatically improve the business results that you achieve with Cloud Computing. The information will be of use to anyone who is responsible for the successful execution of their company’s cloud strategy, both in the line-of-business and IT.
Dennis Drogseth, Vice President, Enterprise Management Associates, Inc and Richard Stone, Compuware’s Cloud Solutions Manager talk about:
• Research results showing how and where companies are benefitting most, and conversely where they’ve had issues, or had to rethink and redirect their cloud initiatives
• Best practices to adopt – and pitfalls to avoid – that will enable you to get cloud deployments right the first time from both a process and an APM perspective
This document discusses managed services and provides an overview of key topics:
- It defines managed services as outsourcing of core business functions like network, hosted services, and IT management to a third party provider.
- Managed services offer benefits like reducing capital and operational expenditures, improving network performance through vendor expertise, and allowing companies to focus on their core business.
- The document compares managed services to outsourcing, noting that managed services transfer day-to-day management and aim to deliver standardized IT functions as a service.
- Top managed services offered include network management, IT infrastructure management, security management, and unified communications. The needs of telecom operators and expectations from managed service providers are also
Rundeck is an open source automation platform that allows users to define, schedule, and execute jobs across multiple servers to automate system administration tasks. It was started in 2010 and has over 100 contributors. Rundeck provides a web interface, API, and CLI to define workflows and orchestrate tasks. It supports plugins for popular DevOps tools and can integrate with other systems like ITSM tools. Rundeck Enterprise provides additional features like high availability, security controls, and support.
What does performance mean in the cloudMichael Kopp
Performance problems are one of the most cited concerns about to the cloud. But is it really the cloud or the application? What does performance mean anyway when you can scale to thousands of servers? This session will discuss why traditional means of performance management and troubleshooting no longer work and how this affects everything. Most importantly we will look at how to identify the root cause of performance problems in such dynamic environments. Finally we will explain how to assess and manage performance when capacity is no longer the issue.
This document discusses managed services and provides an overview of key topics:
- It defines managed services as outsourcing of core business functions like network, hosted services, and IT. Managed service providers plan, deploy, maintain, and optimize the network.
- Benefits of managed services include reducing CAPEX and OPEX, gaining access to technical expertise, improving network performance and efficiency, and allowing companies to focus on their core business.
- The document compares managed services to outsourcing, noting that managed services transfer day-to-day management and focus on delivering specific IT functions as a service.
- A case study outlines how managed services can help telecom operators address challenges like falling revenues
The document discusses how to build trust in cloud computing. It recommends a four-layer approach: 1) Educate yourself on cloud terms and security measures; 2) Monitor cloud services and infrastructure for issues; 3) Establish processes for training, escalation, and documentation; 4) Practice failover procedures by backing up data and testing backup systems. Following these steps can help address common concerns about lack of control, visibility and reliability in cloud computing.
The survey results from landlords and tenants showed that most respondents were tenants (61%) followed by landlords (25%) and those who were both (14%). Most landlords owned 2-3 rental units and managed their properties themselves rather than using a property manager due to the cost. Their top pain points were finding tenants, maintenance issues, and payments. The easiest tasks for landlords were credit checks, finding tenants through sites like Craigslist, and collecting rent payments. Landlords felt the most important potential features of a rental management site would be tenant reputation histories, integrated communication tools, and one credit check accepted by multiple landlords.
“What the hell is cloud computing?” After a year, those infamous words of Oracle CEO Larry Ellison still resonate. The definition of cloud computing is hazy at best, and many companies remain wary of the technology over concerns about infrastructure, security and regulation.
Cloud computing has unique potential to save the enterprise cost, reduce complexity and provide highly available service to the end-user or client. With such compelling benefits, companies should look to understand cloud better—what it is, what it isn’t and what it will be.
In this webinar, Yankee Group analysts Agatha Poon and Camille Mendler define cloud computing and explore the capabilities and challenges of the technology.
Warranty Outsourcing For Strategic GainsImranMasood
1) Warranty management involves many challenges across claims management, field support, and supplier recovery with issues like fraudulent claims, slow cycle times, and suboptimal utilization of resources.
2) Outsourcing warranty management can provide benefits like a 1/3 reduction in costs through process interventions across the entire warranty value chain.
3) When selecting a partner, companies should look for one that can provide a customized solution using best-in-class tools and processes, focus on the right metrics around both contractual SLAs and business outcomes, provide cross-industry benchmarks and analytics capabilities to drive improvements.
JDX is a cloud-based, enterprise grade, creative lifecycle management suite comprising products that help to effectively manage the end-to-end lifecycle of creative ordering, production, workflow, serving, optimization and measurement for media, brands and enterprises at scale
Website: https://jdxsuite.com/
The document discusses implementing a service-oriented architecture (SOA) for the US government. It notes that a survey found 56% of federal IT professionals believe their agency would benefit from an SOA. It provides tips for making an SOA successful, including understanding business objectives and defining value, focusing on understanding requirements, considering people impacts, and taking a long-term focus.
IT Infrastructure Outsourcing Benefits Demystified CTRLS
With data emerging as the biggest asset for organizations, the need and
importance of data centers today cannot be debated. The term data center is
nothing but a facility used to house computing, network and storage equipment.
Data centers can range from a small facility (also called a server room) in some
enterprises to massive-scale infrastructure for enterprises with huge computing
requirements. But in the present day, the increasingly demanding customer
needs, new applications, and advanced infrastructure options and compelling
organizations to rapidly outgrow their existing data centers.
With over 20,000 racks planned across India, CtrlS is India’s first Tier IV Datacenter and the preferred choice of corporate houses both large and small.
Daniel Jasník - ITSMF pro cloudové služby - AID2019ALVAO
Daniel Jasník má více než 15 let zkušeností v oblasti IT, z toho 7 let v ITSM. Nyní poskytuje své konzultantské služby Enterprise zákazníkům Microsoftu v EMEA region s cílem umožnit jim dosáhnout všech výhod, které Microsoft Cloud přináší, a to aplikováním Microsoft Modern Service Management přístupu.
This document discusses IT service level agreements (SLAs). It begins by explaining why companies need network operators to maintain IT services. Next, it defines what an SLA is, which is an agreement that measures IT service quality against customer expectations. The document then discusses how SLA metrics like uptime are interpreted and why maintaining SLAs is important. Finally, it outlines factors that influence maintaining SLAs both internally, like high availability infrastructure, and externally through connectivity providers, as well as controls like monitoring, maintenance, and recovery planning.
BROTIGHT is an IT professional services provider that offers managed IT services and solutions. It has over 10 years of experience and covers tier 1, 2, and 3 cities in China and overseas locations. BROTIGHT has helped clients like petroleum, technology, and retail companies optimize their infrastructure through projects involving data center relocation, network enhancement, and virtualization implementation.
Sciencelogic - A Leader in IT Transformation Chris Phillips
ScienceLogic delivers the next generation IT monitoring platform for the Internet of everything. Over 20,000 global Service Providers, enterprises, and government organisations rely on ScienceLogic every day to significantly enhance their IT operations. With complete Hybrid IT monitoring, total Amazon Web Services (AWS)/ Microsoft Azure visibility, and over 1,000 dynamic management Apps included in the platform, our customers are able to intelligently maximise efficiency, optimise operations, and ensure business continuity.
Similar to Yahoo Display Advertising Attribution (20)
Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).
Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).
Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.
Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
Utilizing Apache NiFi we read various open data REST APIs and camera feeds to ingest crime and related data real-time streaming it into HBase and Phoenix tables. HBase makes an excellent storage option for our real-time time series data sources. We can immediately query our data utilizing Apache Zeppelin against Phoenix tables as well as Hive external tables to HBase.
Apache Phoenix tables also make a great option since we can easily put microservices on top of them for application usage. I have an example Spring Boot application that reads from our Philadelphia crime table for front-end web applications as well as RESTful APIs.
Apache NiFi makes it easy to push records with schemas to HBase and insert into Phoenix SQL tables.
Resources:
https://community.hortonworks.com/articles/54947/reading-opendata-json-and-storing-into-phoenix-tab.html
https://community.hortonworks.com/articles/56642/creating-a-spring-boot-java-8-microservice-to-read.html
https://community.hortonworks.com/articles/64122/incrementally-streaming-rdbms-data-to-your-hadoop.html
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
Whilst HBase is the most logical answer for use cases requiring random, realtime read/write access to Big Data, it may not be so trivial to design applications that make most of its use, neither the most simple to operate. As it depends/integrates with other components from Hadoop ecosystem (Zookeeper, HDFS, Spark, Hive, etc) or external systems ( Kerberos, LDAP), and its distributed nature requires a "Swiss clockwork" infrastructure, many variables are to be considered when observing anomalies or even outages. Adding to the equation there's also the fact that HBase is still an evolving product, with different release versions being used currently, some of those can carry genuine software bugs. On this presentation, we'll go through the most common HBase issues faced by different organisations, describing identified cause and resolution action over my last 5 years supporting HBase to our heterogeneous customer base.
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
LocationTech GeoMesa enables spatial and spatiotemporal indexing and queries for HBase and Accumulo. In this talk, after an overview of GeoMesa’s capabilities in the Cloudera ecosystem, we will dive into how GeoMesa leverages Accumulo’s Iterator interface and HBase’s Filter and Coprocessor interfaces. The goal will be to discuss both what spatial operations can be pushed down into the distributed database and also how the GeoMesa codebase is organized to allow for consistent use across the two database systems.
OCLC has been using HBase since 2012 to enable single-search-box access to over a billion items from your library and the world’s library collection. This talk will provide an overview of how HBase is structured to provide this information and some of the challenges they have encountered to scale to support the world catalog and how they have overcome them.
Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.
Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).
In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
Danny Chen presented on Uber's use of HBase for global indexing to support large-scale data ingestion. Uber uses HBase to provide a global view of datasets ingested from Kafka and other data sources. To generate indexes, Spark jobs are used to transform data into HFiles, which are loaded into HBase tables. Given the large volumes of data, techniques like throttling HBase access and explicit serialization are used. The global indexing solution supports requirements for high throughput, strong consistency and horizontal scalability across Uber's data lake.
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
Recently, Apache Phoenix has been integrated with Apache (incubator) Omid transaction processing service, to provide ultra-high system throughput with ultra-low latency overhead. Phoenix has been shown to scale beyond 0.5M transactions per second with sub-5ms latency for short transactions on industry-standard hardware. On the other hand, Omid has been extended to support secondary indexes, multi-snapshot SQL queries, and massive-write transactions.
These innovative features make Phoenix an excellent choice for translytics applications, which allow converged transaction processing and analytics. We share the story of building the next-gen data tier for advertising platforms at Verizon Media that exploits Phoenix and Omid to support multi-feed real-time ingestion and AI pipelines in one place, and discuss the lessons learned.
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
This document discusses using Apache NiFi to build a high-speed cyber security data pipeline. It outlines the challenges of ingesting, transforming, and routing large volumes of security data from various sources to stakeholders like security operations centers, data scientists, and executives. It proposes using NiFi as a centralized data gateway to ingest data from multiple sources using a single entry point, transform the data according to destination needs, and reliably deliver the data while avoiding issues like network traffic and data duplication. The document provides an example NiFi flow and discusses metrics from processing over 20 billion events through 100+ production flows and 1000+ transformations.
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
This document discusses supporting Apache HBase and improving troubleshooting and supportability. It introduces two Cloudera employees who work on HBase support and provides an overview of typical troubleshooting scenarios for HBase like performance degradation, process crashes, and inconsistencies. The agenda covers using existing tools like logs and metrics to troubleshoot HBase performance issues with a general approach, and introduces htop as a real-time monitoring tool for HBase.
In the healthcare sector, data security, governance, and quality are crucial for maintaining patient privacy and ensuring the highest standards of care. At Florida Blue, the leading health insurer of Florida serving over five million members, there is a multifaceted network of care providers, business users, sales agents, and other divisions relying on the same datasets to derive critical information for multiple applications across the enterprise. However, maintaining consistent data governance and security for protected health information and other extended data attributes has always been a complex challenge that did not easily accommodate the wide range of needs for Florida Blue’s many business units. Using Apache Ranger, we developed a federated Identity & Access Management (IAM) approach that allows each tenant to have their own IAM mechanism. All user groups and roles are propagated across the federation in order to determine users’ data entitlement and access authorization; this applies to all stages of the system, from the broadest tenant levels down to specific data rows and columns. We also enabled audit attributes to ensure data quality by documenting data sources, reasons for data collection, date and time of data collection, and more. In this discussion, we will outline our implementation approach, review the results, and highlight our “lessons learned.”
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.
Extending Twitter's Data Platform to Google CloudDataWorks Summit
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
Companies are increasingly moving to the cloud to store and process data. One of the challenges companies have is in securing data across hybrid environments with easy way to centrally manage policies. In this session, we will talk through how companies can use Apache Ranger to protect access to data both in on-premise as well as in cloud environments. We will go into details into the challenges of hybrid environment and how Ranger can solve it. We will also talk through how companies can further enhance the security by leveraging Ranger to anonymize or tokenize data while moving into the cloud and de-anonymize dynamically using Apache Hive, Apache Spark or when accessing data from cloud storage systems. We will also deep dive into the Ranger’s integration with AWS S3, AWS Redshift and other cloud native systems. We will wrap it up with an end to end demo showing how policies can be created in Ranger and used to manage access to data in different systems, anonymize or de-anonymize data and track where data is flowing.
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.
Background: Some early applications of Computer Vision in Retail arose from e-commerce use cases - but increasingly, it is being used in physical stores in a variety of new and exciting ways, such as:
● Optimizing merchandising execution, in-stocks and sell-thru
● Enhancing operational efficiencies, enable real-time customer engagement
● Enhancing loss prevention capabilities, response time
● Creating frictionless experiences for shoppers
Abstract: This talk will cover the use of Computer Vision in Retail, the implications to the broader Consumer Goods industry and share business drivers, use cases and benefits that are unfolding as an integral component in the remaking of an age-old industry.
We will also take a ‘peek under the hood’ of Computer Vision and Deep Learning, sharing technology design principles and skill set profiles to consider before starting your CV journey.
Deep learning has matured considerably in the past few years to produce human or superhuman abilities in a variety of computer vision paradigms. We will discuss ways to recognize these paradigms in retail settings, collect and organize data to create actionable outcomes with the new insights and applications that deep learning enables.
We will cover the basics of object detection, then move into the advanced processing of images describing the possible ways that a retail store of the near future could operate. Identifying various storefront situations by having a deep learning system attached to a camera stream. Such things as; identifying item stocks on shelves, a shelf in need of organization, or perhaps a wandering customer in need of assistance.
We will also cover how to use a computer vision system to automatically track customer purchases to enable a streamlined checkout process, and how deep learning can power plausible wardrobe suggestions based on what a customer is currently wearing or purchasing.
Finally, we will cover the various technologies that are powering these applications today. Deep learning tools for research and development. Production tools to distribute that intelligence to an entire inventory of all the cameras situation around a retail location. Tools for exploring and understanding the new data streams produced by the computer vision systems.
By the end of this talk, attendees should understand the impact Computer Vision and Deep Learning are having in the Consumer Goods industry, key use cases, techniques and key considerations leaders are exploring and implementing today.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxSunil Jagani
Discover how AI is transforming the workplace and learn strategies for reskilling and upskilling employees to stay ahead. This comprehensive guide covers the impact of AI on jobs, essential skills for the future, and successful case studies from industry leaders. Embrace AI-driven changes, foster continuous learning, and build a future-ready workforce.
Read More - https://bit.ly/3VKly70
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfleebarnesutopia
So… you want to become a Test Automation Engineer (or hire and develop one)? While there’s quite a bit of information available about important technical and tool skills to master, there’s not enough discussion around the path to becoming an effective Test Automation Engineer that knows how to add VALUE. In my experience this had led to a proliferation of engineers who are proficient with tools and building frameworks but have skill and knowledge gaps, especially in software testing, that reduce the value they deliver with test automation.
In this talk, Lee will share his lessons learned from over 30 years of working with, and mentoring, hundreds of Test Automation Engineers. Whether you’re looking to get started in test automation or just want to improve your trade, this talk will give you a solid foundation and roadmap for ensuring your test automation efforts continuously add value. This talk is equally valuable for both aspiring Test Automation Engineers and those managing them! All attendees will take away a set of key foundational knowledge and a high-level learning path for leveling up test automation skills and ensuring they add value to their organizations.
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Keywords: AI, Containeres, Kubernetes, Cloud Native
Event Link: https://meine.doag.org/events/cloudland/2024/agenda/#agendaId.4211
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
Yahoo Display Advertising Attribution
1. Yahoo! Display Ads Attribution
Framework:
A Problem Of Efficient Sparse Joins On Massive Data
Supreeth, Sundeep, Chenjie, Chinmay
Data Team, Yahoo!
1
2. Agenda
§ Problem description
› Serves impressions clicks
› Attribution
§ Class of problems and application in other use cases
§ Attribution framework
§ Performance comparison
§ Conclusion
2
3. Serves Impressions Clicks
Web Ad
Servers Servers
Be the first place people go when they
want to find, explore, and participate with
Impressionsnews, from serious forfun. ad shown
all forms of – client side event to an Serves - Server logged event for
Clicks – client side event for a click on an ad an ad served. Serve has
Interactions – client side events for interactions complete context
within an ad
Serve events are heavy and is
Impressions clicks and conversions are a few a few 10s of KBs
bytes
Serve Guid + Serve timestamp + {other fields of Serve Guid + Serve timestamp + {other
Join
impressions/clicks/interactions} fields of serve}
* Guid is global unique identifier
3
4. Need For Attribution
Serves
5m
Several hours to days Older instances
Impressions/Clicks
Every 5 mins
Attribute an impression/click with the serve
4
5. Distribution Of % Impressions Arrived
From The Client Side wrt Serves
% of Impressions for a serve
90
80
70
60
50
%of Impressions for a serve
40
30 t1->201205301000
t2->201205300955
20 t3->201205300950
.
10 .
.
0
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t12 t13
Time period from when the serves happened
5
6. Distribution Of % Clicks Arrived From
The Client Side wrt Serves
%of Clicks for a serve
45
40
35
30
25
%of Clicks for a serve
20
15
t1->201205301000
t2->201205300955
10
t3->201205300950
.
5 .
.
0
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t12 t13
Time period from when the serves happened
6
7. Class Of Problems
§ Sparse joins spanning TBs of data on grid
§ Few MBs to a few TBs
§ Left outer join or any other outer join
Data Set Impressions Serves (5m*288)
Data Size 400MB 20GB *288 ~= 5.6 TB
(Compressed size)
7
8. Similar Use Cases
§ Associating video, click, social interactions back to the
activity data
§ Attribute back a small size client beacon to a large
dataset
§ Within Yahoo
› Audience view/click attribution
› Weblog based investigation
› Joining dimensional data with web traffic data
8
9. Pig Joins And Problem Fit
Join Strategy Comments Cost
Merge join The datasets are not sorted High
Hash join Shuffle and reduce time High
Replicated Join Does not meet performance High
needs; left outer join on the
replicated dataset
Skewed Join Data set is not skewed N/A
9
10. Problem Statement
To do a sparse outer join on a very large
dataset with high performance requirements
for display ad attribution on grid
10
11. Attribution Framework - Overview
Smart Instrumentation Strategies
Aggressive partitioning and selection
Partition Aware Efficient Join Query
Plan
11
12. Instrument For Attribution
Ø Smart Instrumentation
Strategies
§ Serve guid Aggressive partitioning and
selection
§ Clues which can help you partition better Partition Aware Efficient Join
Query Plan
› Timestamp of the serve
§ Partition keys used in event instrumentation
§ In the impression attribution example:
Impression Serves
Serve Guid + Serve timestamp + {other fields of Serve Guid + Serve timestamp + {other
impressions/clicks/interactions} fields of serve}
12
13. Partitioning approach
§ Join key based partitioning Smart Instrumentation
Strategies
§ Keys for leveraging physical partitioning Ø Aggressive partitioning
and selection
› timestamp Partition Aware Efficient Join
Query Plan
§ Use of hashes in partitioning
› HashFNV, Murmur
Key Partition Type
Join keys Hash
Timestamp Range
13
14. Pruning/Selection
§ Hashing of keys in the data sets Smart Instrumentation
Strategies
Ø Aggressive partitioning
§ Pruning of partitions and selection
Partition Aware Efficient Join
› Timestamp Query Plan
› Hash of the join key
§ IO costs and partitions
§ Configurable partitions
Key Partition Type Pruning
Join keys Hash Yes
Timestamp Range Yes
14
15. Partition Aware Efficient Join Query
Plans
Stream the selected
Impression event keys Smart Instrumentation
Size : MBs
Serve event partitions Strategies
Size : TBs
Aggressive partitioning and
selection
Ø Partition Aware
Inner Efficient Join Query Plan
Join
Stream full
Annotated impression
Impression event
Size : MBs
Size: Hundreds of MBs
Left outer
join
Complete
Annotated Impression
- in memory data with Serve data
- stream
15
16. Attribution Framework: Capabilities
Smart Instrumentation
Strategies
§ Left outer on impression/click/interaction Aggressive partitioning and
selection
› As long as the impression/click/interaction Partition Aware Efficient Join
Query Plan
exists, we will get a record in output
§ Complete annotation with the serve
§ Distinct join with serves
§ Sparse joins achieved by pruning the partitions
§ Map side joins
16
18. Attribution Framework: Tuning Parameters
§ Serve Partitions: trade off between IO & namespace used
(lookback = 24 hours)
4000 180000
Bytes read
Number of files
3500 160000
140000
3000
120000
2500
100000
2000 Bytes Read(GB)
80000 Namespace Used
1500
60000
1000
40000
500 20000
0 0
2 4 8 16 32 64 128 256 512 1024
Partitions
18
19. Attribution Framework: Tuning Parameters
§ Split Size: trade off between number of mappers and map
task run time
(partitions = 16, lookback = 24 hours)
35000 1200
Number of Mappers
Time taken
30000
1000
25000
800
20000
600 Number of Mappers
15000 Time Taken(s)
400
10000
200
5000
0 0
128MB 1 GB 2 GB 3 GB 4 GB
Split Size
19
20. Comparison With Other PIG Joins
Join Mappers Reducers Lookback Input Size Time to
complete
Left Outer 2800 45 40mins 180GB 42.5m*
Hash Join
Replicated 5680 0 5hours 1TB 7m**
Join
Attribution 5760 0 24hours Effective 5.6 TB;
6m***
Framework With Pruning 1.1 TB
* Best case for hash join 1.5m+15.5m+25.5m (Mapper + Shuffle + Reducer)
** Map time taken
*** 1 min + 2mins + 3mins (Selection/Pruning + Impression partitioning +Join)
20
21. Conclusion
§ For the sparse look up problem, the attribution framework
used works very well and within the performance needs
§ Effective partitioning aids longer lookbacks and reduced
IO
§ The levers in the framework allow for tuning based on the
computation/IO requirements
21
22. Future Steps
§ Use Hbase/Cassandra to store the event grain serve data
and do lookups
§ Use of bloom filter along with an index format
§ Compare the strategy with what Hive does and come up
with a framework using Hive
22