My name is Neta Barkay , and I'm a data scientist at LivePerson.
I'd like to share with you a talk I presented at the Underscore Scala community on "Efficient MapReduce using Scalding".
In this talk I reviewed why Scalding fits big data analysis, how it enables writing quick and intuitive code with the full functionality vanilla MapReduce has, without compromising on efficient execution on the Hadoop cluster. In addition, I presented some examples of Scalding jobs which can be used to get you started, and talked about how you can use Scalding's ecosystem, which includes Cascading and the monoids from Algebird library.
Read more & Video: https://connect.liveperson.com/community/developers/blog/2014/02/25/scalding-reaching-efficient-mapreduce
Scalding - the not-so-basics @ ScalaDays 2014Konrad Malawski
This document discusses various big data technologies and how they relate to each other. It explains that Summingbird is built on top of Scalding and Storm, which are built on top of Cascading, which is built on top of Hadoop. It also discusses how Spark relates and compares to these other technologies.
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
Twitter Scalding is built on top of Cascading, which is built on top of Hadoop. It's basically a very nice to read and extend DSL for writing map reduce jobs.
This is an quick introduction to Scalding and Monoids. Scalding is a Scala library that makes writing MapReduce jobs very easy. Monoids on the other hand promise parallelism and quality and they make some more challenging algorithms look very easy.
The talk was held at the Helsinki Data Science meetup on January 9th 2014.
The document discusses different options for performing data analysis on Hadoop clusters, including Scalding, Scoobi, and Scrunch. It provides a brief overview of each option and code examples. While the options are similar, the author notes they are working to develop a common API. The key takeaways are that functional programming is well-suited for mapreduce problems and using Scalding, Scoobi, or Scrunch can increase productivity over traditional mapreduce.
This document provides information about functions in Apache Hive, including a cheat sheet covering user defined functions (UDFs) and built-in functions. It describes how to create UDFs, UDAFs, and UDTFs in Hive along with examples. The document also lists many common mathematical, string, date and other function types available in Hive with descriptions.
Scalding is a Scala library built on top of Cascading that simplifies the process of defining MapReduce programs. It uses a functional programming approach where data flows are represented as chained transformations on TypedPipes, similar to operations on Scala iterators. This avoids some limitations of the traditional Hadoop MapReduce model by allowing for more flexible multi-step jobs and features like joins. The Scalding TypeSafe API also provides compile-time type safety compared to Cascading's runtime type checking.
Dean Wampler presents on using Scalding, which leverages Cascading, to write MapReduce jobs in a more productive way. Cascading provides higher-level abstractions for building data pipelines and hides much of the boilerplate of the Hadoop MapReduce framework. It allows expressing jobs using concepts like joins and group-bys in a cleaner way focused on the algorithm rather than infrastructure details. Word count is shown implemented in the lower-level MapReduce API versus in Cascading Java code to demonstrate how Cascading minimizes boilerplate and exposes the right abstractions.
Scalding - the not-so-basics @ ScalaDays 2014Konrad Malawski
This document discusses various big data technologies and how they relate to each other. It explains that Summingbird is built on top of Scalding and Storm, which are built on top of Cascading, which is built on top of Hadoop. It also discusses how Spark relates and compares to these other technologies.
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
Twitter Scalding is built on top of Cascading, which is built on top of Hadoop. It's basically a very nice to read and extend DSL for writing map reduce jobs.
This is an quick introduction to Scalding and Monoids. Scalding is a Scala library that makes writing MapReduce jobs very easy. Monoids on the other hand promise parallelism and quality and they make some more challenging algorithms look very easy.
The talk was held at the Helsinki Data Science meetup on January 9th 2014.
The document discusses different options for performing data analysis on Hadoop clusters, including Scalding, Scoobi, and Scrunch. It provides a brief overview of each option and code examples. While the options are similar, the author notes they are working to develop a common API. The key takeaways are that functional programming is well-suited for mapreduce problems and using Scalding, Scoobi, or Scrunch can increase productivity over traditional mapreduce.
This document provides information about functions in Apache Hive, including a cheat sheet covering user defined functions (UDFs) and built-in functions. It describes how to create UDFs, UDAFs, and UDTFs in Hive along with examples. The document also lists many common mathematical, string, date and other function types available in Hive with descriptions.
Scalding is a Scala library built on top of Cascading that simplifies the process of defining MapReduce programs. It uses a functional programming approach where data flows are represented as chained transformations on TypedPipes, similar to operations on Scala iterators. This avoids some limitations of the traditional Hadoop MapReduce model by allowing for more flexible multi-step jobs and features like joins. The Scalding TypeSafe API also provides compile-time type safety compared to Cascading's runtime type checking.
Dean Wampler presents on using Scalding, which leverages Cascading, to write MapReduce jobs in a more productive way. Cascading provides higher-level abstractions for building data pipelines and hides much of the boilerplate of the Hadoop MapReduce framework. It allows expressing jobs using concepts like joins and group-bys in a cleaner way focused on the algorithm rather than infrastructure details. Word count is shown implemented in the lower-level MapReduce API versus in Cascading Java code to demonstrate how Cascading minimizes boilerplate and exposes the right abstractions.
This document discusses Scoobi, a Scala library for developing MapReduce applications on Hadoop. Some key points:
1) Scoobi allows developers to write Hadoop MapReduce jobs using a functional programming style in Scala, inspired by Google's FlumeJava. It provides abstractions like DList and DObject to represent distributed datasets and computations.
2) Under the hood, Scoobi compiles Scala code into Java MapReduce jobs that run on Hadoop. It handles partitioning, parallelization, and distribution of data and computation across clusters.
3) Examples show how common operations like filtering, mapping, reducing can be expressed concisely using the Scoobi API, mirroring Scala
The document discusses Hive, an open source data warehousing system built on Hadoop that allows users to query large datasets using SQL. It describes Hive's data model, architecture, query language features like joins and aggregations, optimizations, and provides examples of how queries are executed using MapReduce. The document also covers Hive's metastore, external tables, data types, and extensibility features.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
This document provides an agenda and overview for a Spark workshop covering Spark basics and streaming. The agenda includes sections on Scala, Spark, Spark SQL, and Spark Streaming. It discusses Scala concepts like vals, vars, defs, classes, objects, and pattern matching. It also covers Spark RDDs, transformations, actions, sources, and the spark-shell. Finally, it briefly introduces Spark concepts like broadcast variables, accumulators, and spark-submit.
This document provides an overview of MongoDB aggregation which allows processing data records and returning computed results. It describes some common aggregation pipeline stages like $match, $lookup, $project, and $unwind. $match filters documents, $lookup performs a left outer join, $project selects which fields to pass to the next stage, and $unwind deconstructs an array field. The document also lists other pipeline stages and aggregation pipeline operators for arithmetic, boolean, and comparison expressions.
Wprowadzenie do technologi Big Data i Apache HadoopSages
The document introduces concepts related to Big Data technology including volume, variety, and velocity of data. It discusses Hadoop architecture including HDFS, MapReduce, YARN, and the Hadoop ecosystem. Examples are provided of common Big Data problems and how they can be solved using Hadoop frameworks like Pig, Hive, and Ambari.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Cascading provides a simpler way to write MapReduce programs through data flows. It uses a pipe and tap metaphor where data flows through pipes and is read from or written to taps. This allows assembling MapReduce jobs as data flow graphs in a more logical way compared to the traditional MapReduce API.
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Using Cerberus and PySpark to validate semi-structured datasetsBartosz Konieczny
This short presentation shows one of ways to to integrate Cerberus and PySpark. It was initially given at Paris.py meetup (https://www.meetup.com/Paris-py-Python-Django-friends/events/264404036/)
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Samir Bessalah
The document discusses the Algebird library, which uses concepts from abstract algebra to enable distributed analytics. It describes how algebraic structures like monoids allow operations to be associative, commutative and parallelizable. This supports scalable analysis of large datasets using techniques such as sketches, bloom filters and priority queues. Algebird provides implementations of these structures to perform tasks like top-k analysis, cardinality estimation and streaming analytics in both batch and real-time systems.
Scalding provides a Scala API on top of Cascading to make MapReduce programming easier and more expressive. It allows writing MapReduce jobs in a functional style with fewer lines of code compared to traditional Java MapReduce. Scalding supports various data sources and sinks, map and reduce operations, joining pipes, and connecting to external systems like Hive and Elasticsearch. It also enables testing MapReduce jobs more easily through an in-memory approach. The future of Scalding may include support for real-time and hybrid batch/real-time systems through projects like Summingbird.
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
Jayesh Thakrar, Senior Systems Engineer, Conversant
The venerable HBase shell is often regarded as a simple utility to perform basic DDL and maintenance activities. However, it is in fact a powerful, interactive programming environment, primarily due to the JRuby engine under the covers. In this presentation, I'll describe its JRuby heritage and show some of the things that can be done with the "ird" (interactive ruby shell), as well as show how to exploit JRuby and Java integration via concrete working examples. In addition, I will demonstrate how the "shell" can be used in Hadoop streaming to quickly perform complex and large volume batch jobs.
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
This document provides an introduction to the R programming language presented by Alex Storer at ComputeFest 2012. It discusses why R should be used over other languages like MATLAB and Python, provides examples of basic R syntax and functions, and walks through an example of loading climate data and creating plots to visualize rainfall anomalies over time. The goal is to provide attendees with a foundation of R basics while working through a real data analysis problem.
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseSages
Szybkie wprowadzenie do technologii Pig i Hive z ekosystemu Hadoop. Prezentacja wykonana w ramach warsztatów Codepot w dniu 29.08.2015. Prezentacja wykonana przez Radosława Stankiewicza oraz Bartłomieja Tartanusa.
The document discusses Scala concepts like implicit conversions and parameters as well as data processing frameworks like Hadoop, Cascading, and Scalding. It provides examples of using these concepts and frameworks to count the frequency of the first characters in words from a file using different approaches, from basic Scala to implementations with Cascading and Scalding.
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Presentazione dell'evento EsInRome del 7 Febbraio 2017 - Integrazione Elasticsearch in architettura BigData e facilità di integrazione con Apache Spark.
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...Lucidworks
This document summarizes options for ingesting logs into Apache Solr using Logstash and rsyslog. It discusses sending logs from Logstash or rsyslog to Solr, and processing logs with Logstash, rsyslog, or using rsyslog with Redis and Logstash before indexing with Solr. Configuration examples are provided for Logstash and rsyslog to ingest logs and structure them as JSON for indexing in Solr.
Making Structured Streaming Ready for ProductionDatabricks
In mid-2016, we introduced Structured Steaming, a new stream processing engine built on Spark SQL that revolutionized how developers can write stream processing application without having to reason about having to reason about streaming. It allows the user to express their streaming computations the same way you would express a batch computation on static data. The Spark SQL engine takes care of running it incrementally and continuously updating the final result as streaming data continues to arrive. It truly unifies batch, streaming and interactive processing in the same Datasets/DataFrames API and the same optimized Spark SQL processing engine.
The initial alpha release of Structured Streaming in Apache Spark 2.0 introduced the basic aggregation APIs and files as streaming source and sink. Since then, we have put in a lot of work to make it ready for production use. In this talk, Tathagata Das will cover in more detail about the major features we have added, the recipes for using them in production, and the exciting new features we have plans for in future releases. Some of these features are as follows:
- Design and use of the Kafka Source
- Support for watermarks and event-time processing
- Support for more operations and output modes
Speaker: Tathagata Das
This talk was originally presented at Spark Summit East 2017.
Hive provides an SQL-like interface to query and analyze large datasets stored in Hadoop. It allows users to model data as tables and analyze the data using SQL queries without needing to learn MapReduce programming. Hive generates MapReduce jobs behind the scenes to parallelize the processing and generate results. The system works by storing metadata about the tables in a metastore and then using this metadata to generate MapReduce jobs for queries. This allows Hive to provide a more programmer-friendly interface compared to raw MapReduce for working with large datasets.
This document discusses Scoobi, a Scala library for developing MapReduce applications on Hadoop. Some key points:
1) Scoobi allows developers to write Hadoop MapReduce jobs using a functional programming style in Scala, inspired by Google's FlumeJava. It provides abstractions like DList and DObject to represent distributed datasets and computations.
2) Under the hood, Scoobi compiles Scala code into Java MapReduce jobs that run on Hadoop. It handles partitioning, parallelization, and distribution of data and computation across clusters.
3) Examples show how common operations like filtering, mapping, reducing can be expressed concisely using the Scoobi API, mirroring Scala
The document discusses Hive, an open source data warehousing system built on Hadoop that allows users to query large datasets using SQL. It describes Hive's data model, architecture, query language features like joins and aggregations, optimizations, and provides examples of how queries are executed using MapReduce. The document also covers Hive's metastore, external tables, data types, and extensibility features.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
This document provides an agenda and overview for a Spark workshop covering Spark basics and streaming. The agenda includes sections on Scala, Spark, Spark SQL, and Spark Streaming. It discusses Scala concepts like vals, vars, defs, classes, objects, and pattern matching. It also covers Spark RDDs, transformations, actions, sources, and the spark-shell. Finally, it briefly introduces Spark concepts like broadcast variables, accumulators, and spark-submit.
This document provides an overview of MongoDB aggregation which allows processing data records and returning computed results. It describes some common aggregation pipeline stages like $match, $lookup, $project, and $unwind. $match filters documents, $lookup performs a left outer join, $project selects which fields to pass to the next stage, and $unwind deconstructs an array field. The document also lists other pipeline stages and aggregation pipeline operators for arithmetic, boolean, and comparison expressions.
Wprowadzenie do technologi Big Data i Apache HadoopSages
The document introduces concepts related to Big Data technology including volume, variety, and velocity of data. It discusses Hadoop architecture including HDFS, MapReduce, YARN, and the Hadoop ecosystem. Examples are provided of common Big Data problems and how they can be solved using Hadoop frameworks like Pig, Hive, and Ambari.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Cascading provides a simpler way to write MapReduce programs through data flows. It uses a pipe and tap metaphor where data flows through pipes and is read from or written to taps. This allows assembling MapReduce jobs as data flow graphs in a more logical way compared to the traditional MapReduce API.
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Using Cerberus and PySpark to validate semi-structured datasetsBartosz Konieczny
This short presentation shows one of ways to to integrate Cerberus and PySpark. It was initially given at Paris.py meetup (https://www.meetup.com/Paris-py-Python-Django-friends/events/264404036/)
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Samir Bessalah
The document discusses the Algebird library, which uses concepts from abstract algebra to enable distributed analytics. It describes how algebraic structures like monoids allow operations to be associative, commutative and parallelizable. This supports scalable analysis of large datasets using techniques such as sketches, bloom filters and priority queues. Algebird provides implementations of these structures to perform tasks like top-k analysis, cardinality estimation and streaming analytics in both batch and real-time systems.
Scalding provides a Scala API on top of Cascading to make MapReduce programming easier and more expressive. It allows writing MapReduce jobs in a functional style with fewer lines of code compared to traditional Java MapReduce. Scalding supports various data sources and sinks, map and reduce operations, joining pipes, and connecting to external systems like Hive and Elasticsearch. It also enables testing MapReduce jobs more easily through an in-memory approach. The future of Scalding may include support for real-time and hybrid batch/real-time systems through projects like Summingbird.
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
Jayesh Thakrar, Senior Systems Engineer, Conversant
The venerable HBase shell is often regarded as a simple utility to perform basic DDL and maintenance activities. However, it is in fact a powerful, interactive programming environment, primarily due to the JRuby engine under the covers. In this presentation, I'll describe its JRuby heritage and show some of the things that can be done with the "ird" (interactive ruby shell), as well as show how to exploit JRuby and Java integration via concrete working examples. In addition, I will demonstrate how the "shell" can be used in Hadoop streaming to quickly perform complex and large volume batch jobs.
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
This document provides an introduction to the R programming language presented by Alex Storer at ComputeFest 2012. It discusses why R should be used over other languages like MATLAB and Python, provides examples of basic R syntax and functions, and walks through an example of loading climate data and creating plots to visualize rainfall anomalies over time. The goal is to provide attendees with a foundation of R basics while working through a real data analysis problem.
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseSages
Szybkie wprowadzenie do technologii Pig i Hive z ekosystemu Hadoop. Prezentacja wykonana w ramach warsztatów Codepot w dniu 29.08.2015. Prezentacja wykonana przez Radosława Stankiewicza oraz Bartłomieja Tartanusa.
The document discusses Scala concepts like implicit conversions and parameters as well as data processing frameworks like Hadoop, Cascading, and Scalding. It provides examples of using these concepts and frameworks to count the frequency of the first characters in words from a file using different approaches, from basic Scala to implementations with Cascading and Scalding.
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Presentazione dell'evento EsInRome del 7 Febbraio 2017 - Integrazione Elasticsearch in architettura BigData e facilità di integrazione con Apache Spark.
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...Lucidworks
This document summarizes options for ingesting logs into Apache Solr using Logstash and rsyslog. It discusses sending logs from Logstash or rsyslog to Solr, and processing logs with Logstash, rsyslog, or using rsyslog with Redis and Logstash before indexing with Solr. Configuration examples are provided for Logstash and rsyslog to ingest logs and structure them as JSON for indexing in Solr.
Making Structured Streaming Ready for ProductionDatabricks
In mid-2016, we introduced Structured Steaming, a new stream processing engine built on Spark SQL that revolutionized how developers can write stream processing application without having to reason about having to reason about streaming. It allows the user to express their streaming computations the same way you would express a batch computation on static data. The Spark SQL engine takes care of running it incrementally and continuously updating the final result as streaming data continues to arrive. It truly unifies batch, streaming and interactive processing in the same Datasets/DataFrames API and the same optimized Spark SQL processing engine.
The initial alpha release of Structured Streaming in Apache Spark 2.0 introduced the basic aggregation APIs and files as streaming source and sink. Since then, we have put in a lot of work to make it ready for production use. In this talk, Tathagata Das will cover in more detail about the major features we have added, the recipes for using them in production, and the exciting new features we have plans for in future releases. Some of these features are as follows:
- Design and use of the Kafka Source
- Support for watermarks and event-time processing
- Support for more operations and output modes
Speaker: Tathagata Das
This talk was originally presented at Spark Summit East 2017.
Hive provides an SQL-like interface to query and analyze large datasets stored in Hadoop. It allows users to model data as tables and analyze the data using SQL queries without needing to learn MapReduce programming. Hive generates MapReduce jobs behind the scenes to parallelize the processing and generate results. The system works by storing metadata about the tables in a metastore and then using this metadata to generate MapReduce jobs for queries. This allows Hive to provide a more programmer-friendly interface compared to raw MapReduce for working with large datasets.
The MapReduce job begins when a client program uploads configuration files to HDFS and notifies the JobTracker. The JobTracker assigns map tasks to idle TaskTrackers and the tasks extract input data, invoke the user-provided map function, and output intermediate key-value pairs. When the map tasks complete, reduce tasks are assigned to TaskTrackers to download intermediate data and invoke the reduce function to generate the final output. The framework is resilient to failures and can re-execute failed tasks as needed.
Kavita lakhani- Advanced Program in Digital Marketing - Project Submission AP...Kavita Lakhani
This document provides a digital marketing plan for KT International to increase online sales. It recommends redesigning the website for SEO, paid search advertising, social media campaigns on Facebook and LinkedIn, email marketing, and affiliate marketing. Specific tactics include creating ads and campaigns for electronic safes and cash counting machines, building communities on social platforms, hosting contests and live chats, and partnering with affiliates. The plan aims to boost brand awareness, engagement, and sales of key products through an integrated digital strategy.
Ekonomisten Konpetentzia Profesionalak detektatzeko azterlana enplegatzailearen ikuspuntutik.
Ezinbesteko tresna da kudeaketa, antolakuntza, unibertsitate eta prestakuntza zentroen arloan jarduten dutenentzat, eta azken finean, pertsonen garapenean lan egiten dutenentzat.
This document provides step-by-step instructions for performing a fresh install of Windows 98. It details preparing the computer by changing the BIOS settings to boot from CD-ROM first. It then guides the user through formatting the hard drive, running the Windows 98 setup program to install the OS, and configuring basic settings like the computer name. Upon completion, the user is instructed to restart the computer to finish the Windows 98 installation process.
This document provides instructions for setting and removing passwords in Microsoft Word documents. It describes how to encrypt a document and set an open password with up to 255 characters. It also explains how to remove password protection from an encrypted document by opening it with the existing password and deleting the password. Additionally, it outlines how to set a modify password that allows others to edit the document with the correct password entered.
This document summarizes an ad contest event hosted by AAF-Kansas City for members under age 32. It provides the rules and structure of the contest, which involves 6 rounds of 10 multiple choice questions in different advertising-related categories. It lists the categories as Digital Trends, Product Placement, Typography, Media, "Nuts", and a mix from 2009-2010. It also lists the event sponsors and auctioneers who will be part of the event.
This document summarizes a webinar about workflows in WorldCat Navigator. It outlines the agenda which includes terminology, policy changes needed for implementation, workflows for items held locally or at other libraries, and patron and staff workflows for managing requests. The document then provides detailed descriptions and diagrams of the various lending and borrowing workflows in WorldCat Navigator such as processing requests, handling overdue items, renewals and more.
How to Run Successful Adwords Campaigns for Multi-Location BusinessesPowered by Search
This document discusses the importance of mobile advertising for multi-location businesses. It notes that phone leads are worth 3x more than website clicks, and that geo-fencing and bid adjustments are important. It recommends adding a click-to-call extension to drive more phone calls, as phone callers have a higher urgency to buy. It also recommends including your address in sitelinks to boost in-store visits, and using display campaigns and smart remarketing to maximize revenues.
This document provides a review of the Audolici A1/25 integrated amplifier. It describes the amplifier's ability to produce incredibly lifelike music reproduction, transporting the listener to the recording. The review discusses Audolici's founder and his background in tube engineering. It details how the A1/25 uses high quality components and tubes to achieve audiophile sound quality, but in a more affordable package than other Audolici models. The review concludes the A1/25 is capable of revealing new details in familiar songs and transports the listener to the world of the recording.
The document discusses the high cost of textbooks in the US and developing alternatives through open educational resources and the Global Text Project. The Global Text Project aims to provide free, high-quality textbooks online and in multiple languages to students in developing economies who cannot afford traditional textbooks. It is developing an open library of textbooks through the contributions of an international community of academics and students.
Puls Media Network is a leading Romanian communication company specialized in the medical field. It publishes 8 specialized medical journals reaching over 30,000 medical professionals. It also publishes an annual consumer health publication with a circulation of 30,000. Additionally, it organizes over 50 medical events per year and operates a medical information portal. Puls Media Network aims to provide specialized information to medical professionals and consumers to support continuous learning and healthy lifestyles.
The document summarizes the training program for financial advisors at Edward Jones. It discusses the initial training period of 4 months including studying for licensing exams. Trainees then go through a 9 week post-exam program covering client interactions, prospecting, and skills evaluation. Edward Jones invests heavily in ongoing training through regular conferences and workshops to reinforce advisors' learning and ensure positive results. The detailed training leads to high pass rates on licensing exams and sets advisors up for successful careers by making them comfortable and prepared to work with clients.
This document discusses strategies for LGBT social media and web 2.0 marketing. It focuses on leveraging interpersonal relationships through social networks like Facebook, Twitter, LinkedIn and others. It provides tips on creating and sharing content across multiple platforms, identifying influencers, and measuring the results of social media marketing efforts. The goal is to engage audiences and spread messages through social interactions and relationships.
Tiga tantangan utama dalam memperbaiki sistem manajemen pengeluaran publik di ETTA adalah (1) mengembangkan kerangka anggaran jangka menengah berbasis program dan target kinerja, (2) memperkuat proses tinjauan kebijakan dan anggaran, serta (3) meningkatkan pemantauan dan pelaporan kinerja secara teratur untuk meningkatkan akuntabilitas.
This document summarizes a talk given about Nokia's migration to Scala for its Places API. The key points are:
1) Nokia migrated its Places API codebase to Scala to take advantage of Scala's features like its powerful type system, immutable data structures, and functional programming capabilities.
2) The migration was done gradually over time while continuing to develop new features. They discovered many benefits of Scala along the way like improved test readability and JSON parsing capabilities.
3) Nokia uses Scala features like case classes, options, and functions to model data and add type safety to its codebase. This uncovered bugs that would have been hard to find in Java.
CS442 - Rogue: A Scala DSL for MongoDBjorgeortiz85
Talk at Stanford's CS442 (High Productivity and Performance with Domain Specific Languages in Scala http://www.stanford.edu/class/cs442/), on Rogue. 5/24/2011
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
This document summarizes key classes and methods from the Ring web library (weblib.ring).
The Application class contains methods for encoding, decoding, cookies, and more. The Page class contains methods for generating common HTML elements and structures. Model classes like UsersModel manage data access and object relational mapping. Controller classes handle requests and coordinate the view and model.
Patterns for slick database applicationsSkills Matter
Slick is Typesafe's open source database access library for Scala. It features a collection-style API, compact syntax, type-safe, compositional queries and explicit execution control. Community feedback helped us to identify common problems developers are facing when writing Slick applications. This talk suggests particular solutions to these problems. We will be looking at reducing boiler-plate, re-using code between queries, efficiently modeling object references and more.
The Ring programming language version 1.7 book - Part 48 of 196Mahmoud Samir Fayed
This document provides code examples and documentation for Ring's web library (weblib.ring). It describes classes and methods for generating HTML pages, forms, tables and other elements. This includes the Page class for adding common elements like text, headings, paragraphs etc., the Application class for handling requests, cookies and encoding, and classes representing various HTML elements like forms, inputs, images etc. It also provides an overview of how to create pages dynamically using View and Controller classes along with Model classes for database access.
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBjorgeortiz85
This document summarizes a presentation about Rogue, a Scala DSL for MongoDB. Some key points:
- Rogue allows type-safe querying of MongoDB with features like filters, pagination, and awareness of indexes.
- It uses phantom types to prevent issues like multiple selects or limits.
- Queries can be logged and validated.
- Future goals include iteratees for cursors, compile-time checking, and generating JavaScript for map-reduce.
This document discusses refactoring Java code to Clojure using macros. It provides examples of refactoring Java code that uses method chaining to equivalent Clojure code using the threading macros (->> and -<>). It also discusses other Clojure features like type hints, the doto macro, and polyglot projects using Leiningen.
En vieux bourlingueur du langage Swift, Grégoire Lhotellier viendra nous présenter les séquences et les collections du nouveau langage d’Apple. Il nous briefera sur l’essentiel de ce qu’il faut en savoir et ce qu’elles changent par rapport à leurs équivalent Objective-C.
This document summarizes Apache Spark batch APIs, provides real-world examples of Spark jobs, addresses shortcomings of the Spark APIs, and outlines how to run and configure Spark jobs on AWS EMR. The document introduces the RDD, SQL, DataFrame and Dataset APIs in Spark and compares them. It then gives examples of enriching and shredding data with Spark. It discusses type-safe APIs to address issues in the default Spark APIs. Finally, it outlines the configuration needed to run optimized Spark jobs on EMR, including memory, parallelism and allocation settings.
Programiści aplikacji Mobilnych na Androida, uwięzieni w czasach Java 1.7 od pewnego czasu eksperymentowali z innymi językami programowania. Żaden nie zdobył do tej pory takiej popularności jak Kotlin. Ale czy faktycznie jest to coś rewolucyjnego? Przecież getery, settery i konstruktory wygenerujemy za pomocą Lomboka. Używając Retrolamby zyskamy wsparcie dla dopełnień. A dodatkowo od niedawna Android ma wsparcie dla Javy 8.
Zatem co decyduje o sile Kotlina, które konstrukcje i właściwości języka powodują, że warto zastosować go w swoim projekcie? Jaki wpływ będzie to miało na architekturę aplikacji i wydajność? Kotlin jest tylko ciekawostką czy spowoduje, że będziesz kodował efektywniej? Z tej prezentacji wyniesiesz pełen zestaw informacji pozwalający odpowiedzieć na wszystkie te pytania.
After migrating a three year old C# project to Java we ending up with a significant portion of legacy code using lambdas in Java. What was some of the good use cases, code which could be written better and the problems we had migrating from C#. At the end we look at the performance implications of using Lambdas.
The document summarizes Scala, a functional programming language that runs on the Java Virtual Machine (JVM). It discusses Scala's core features like being object-oriented, type inference, and support for functional programming with immutable data structures and passing functions as parameters. It also provides examples of using Scala collections like List and Array, and functions like map, filter, flatMap, and foldLeft/reduceLeft. Finally, it demonstrates using Scala for domain-specific languages and shows examples of defining DSLs for querying and generating JavaScript.
AST - the only true tool for building JavaScriptIngvar Stepanyan
The document discusses working with code abstract syntax trees (ASTs). It provides examples of parsing code into ASTs using libraries like Esprima, querying ASTs using libraries like grasp-equery, constructing and transforming ASTs, and generating code from ASTs. It introduces aster, an AST-based code builder that allows defining reusable AST transformations as plugins and integrating AST-based builds into generic build systems like Grunt and Gulp. Aster aims to improve on file-based builders by working directly with ASTs in a streaming fashion.
The document describes MOBL, a programming language for building mobile web applications. MOBL aims to provide a small core language with large and extensible libraries. It includes built-in types, controls, and abstraction mechanisms like screens and functions. The language exposes low-level primitives while providing a native interface to external APIs. MOBL code can be deployed by concatenating, eliminating dead code, and minifying for client-side execution on mobile browsers. The language has been publicly released since January 2011 and sees over 1,000 visitors per day, with ongoing development focused on error handling, data evolution, documentation and libraries.
This document provides an overview of the Scala programming language. Some key points:
- Scala runs on the Java Virtual Machine and was created by Martin Odersky at EPFL.
- It has been around since 2003 and the current stable release is 2.7.7. Release 2.8 beta 1 is due out soon.
- Scala combines object-oriented and functional programming. It has features like pattern matching, actors, XML literals, and more that differ from Java. Everything in Scala is an object.
Type safe embedded domain-specific languagesArthur Xavier
Language is everything; it governs our lives: from our thought processes, our communication abilities and our understanding of the world, all the way up to law, politics, logic and programming. All of these domains of human experience are governed by different languages that talk to each other, and so should be your code. Haskell provides all the means necessary—and many more—to easily and safely use embedded small languages that are tailored to specific needs and business domains.
In this series of lectures and workshops, we will explore the whats, whys and hows of embedded domain-specific languages in Haskell, and how language oriented programing can bring type-safety, composability and simplicity to the development of complex applications.
1) The document discusses how a company gradually transitioned from a monolithic architecture to microservices using Apache Kafka as the backbone.
2) It outlines the steps taken, including defining service responsibilities and data models, using event sourcing and CQRS patterns, designing Kafka topics, and validating data.
3) The document emphasizes that Kafka should be the single source of truth for critical data and applications should be able to reprocess historical data from Kafka topics.
This document provides an introduction and overview of GraphQL. It begins with an example comparing making multiple REST API calls to fetch related data versus making a single GraphQL query. Key points covered include GraphQL's characteristics like being a query language that is agnostic to storage and returning only requested data, advantages over REST like fewer requests and tailored responses, and potential drawbacks like increased coupling. The document demonstrates GraphQL syntax and concepts like queries, mutations, fragments, and directives. It also discusses GraphQL adoption by companies like Facebook and how Liveperson evaluates it.
Kubernetes your tests! automation with docker on google cloud platformLivePerson
Arik Lerner, Automation Team Leader, and Waseem Hamshawi, Automation Infra Developer, present how to build a large scale automated testing platform by leveraging containers orchestration over GCP, with the ability to scale out and provide fast feedback while maintaining a highly reliable test infrastructure.
The presentation includes new approach of managing a scalable testing platform of distributed automated tests with Kubernetes and Docker over Google Cloud Platform.
Topics:
• GCP and Kubernetes introduction for automated testing
• Traditional Selenium Grid vs Selenium Standalone with Kubernetes and Docker for Web and Mobile tests
• Distributed and containerized testing environment over container cluster - different use cases
Ephemerals - "Short-lived Testing Endpoints". An Open Source by LivePerson which makes automation testing at large scale like a "Walk in the park".
In this Meetup Yaar Reuveni – Team Leader & Nir Hedvat – Software Engineer from Liveperson Data Platform R&D team, will talk about the journey we made from early days of the data platform in production with high friction and low awareness to issues into a mature, measurable data platform that is visible and trustworthy.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
video: https://www.youtube.com/watch?v=IBC9gcYqNR4
In this talk Efim Dimenstein, Chief Architect at Liveperson will cover the rules and guidelines of building resilient systems, implementing them in real life and lessons learned during the process. The talk will focus on achieving resilience in real life and will feature a lot of examples and lessons learned from building systems currently in production running at extreme scale.
Efim will talk about:
· General resilience guidelines
· How they are implemented in practice
· What changes needed to be implemented to achieve
resilience
· Lessons learned
· Summary
My name is Victor Perepelitsky I'm an R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team.
In this Meetup I talked about the journey of creating the platform from scratch - challenges, design decisions, technology choices and more.
During the last 3 years the team has built Real Time Event Processing Platform which is currently running in production with thousands of new and migrated customers. It is built to handle hundreds of thousands requests per/sec with low latency response time (under 30 ms round trip)
I went through different topics and stages of this journey and share details that led to specific choices and results.
“Stateful or Stateless”, “CEP”, “Rules engine”, “Automated performance testing”, “Locking”, “Timing” were a part of the menu.
In this meetup, Kobi Salant - Data Platform Technical Lead & Vladi Feigin - Data System Architect, both from Liveperson will talk about : Making scale a non-issue for real-time Data apps.
Have you ever tried to build a system processing in real-time hundreds of thousands events per second and servicing more than 1M concurrent visitors?
We're going to talk about the LivePerson real-time stream processing solution doing exactly that. Learn how we empower digital call centers with insights for their critical decision making processes and never-ending efficiency goals.
In this talk Sergei Koren, Production Architect at LivePerson will present HTTP/2, the official successor of HTTP 1.1, and how it would influence Web as we know it.
Sergei will talk about:
- HTTP/2 history
- The major changes - what do and don’t
- Expected changes to Web as we use it today
- Proposed checklist for implementation: how and when; from Production point of view.
Mobile app real-time content modifications using websocketsLivePerson
We are happy to host Benny Weingarten-Gabbay, Senior Software Engineer at eBay at our offices.
Benny presents BetterContent, a tool that allows editing of an iOS mobile app in runtime, in a fun and easy way.
Read more on our DevBlog:
https://connect.liveperson.com/community/developers/blog/2015/03/26/mobile-app-real-time-content-modifications-using-websockets
Mobile SDK: Considerations & Best Practices LivePerson
Mobile SDKs are a great way to make your service or API easily consumable by the large number of developers out there looking for state of the art tools to make their apps stand out in the competitive marketplaces, but building a stable, compatible and successful SDK is quite a challenge.
In this talk we the technical and design challenges involved in developing an efficient mobile SDK that is highly compatible with its host mobile app, and the various considerations we took into account and the lessons we’ve learned while designing and building LivePerson’s native mobile SDK.
In this Meetup Victor Perepelitsky - R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team , will talk about Java 8', 'Stream API', 'Lambda', and 'Method reference'.
Victor will clarify what functional programming is and how can you use java 8 in order to create better software.
Victor will also cover some pain points that Java 8 did not solve regarding functionality and see how you can work around it.
Amihay Zer-Kavod discusses how LivePerson uses Apache Avro to maintain consistent data across services. Avro provides a unified event schema and tools for serialization, enabling events to be sent between services and stored in Hadoop. LivePerson's use of an event-driven system with a common Avro schema allows over 320,000 events per second to be processed and over 2TB of data to be stored daily.
Apache Avro and Messaging at Scale in LivePersonLivePerson
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.
In this lecture, Sergei Koren, System architect at LivePerson production team presents data & image compression and its effective usage in modern web and data flows.
Support Office Hour Webinar - LivePerson API LivePerson
Course description and agenda
LivePerson enables the creation of innovative applications designed to enhance and extend the functionality of your LivePerson solution, as well as cooperate with partners worldwide.
In this session we will demonstrate the LivePerson API offerings, the development process and quick overview of CHAT API and its basic usage. You will also have an opportunity to ask questions relevant to your business.
Host: Nitay Bartal
Date: July 17, 2014
Time: 11:00 AM - 12:00 PM EST
Duration: 60 minutes
Agenda:
- Leveraging LivePerson APIs to your benefit
- Overview of LivePerson API offerings
- Introduction to LivePerson Developers Network
- Overview of the Development process
- Tools and best practices
- Helpful tips and tricks
- Q&A
SIP - More than meets the eye
Speakers:
Ofer Cohen - VOIP Group Leader, LivePerson
Yossi Maimon - VOIP Technical Leader, LivePerson
An Introduction to the SIP protocol.
SIP Position in telecommunication networks and the content services.
What is SIP:
The Session Initiation Protocol (SIP) is a signaling communications protocol, widely used for controlling multimedia communication sessions such as voice and video calls over Internet Protocol (IP) networks.
The protocol defines the messages that are sent between peers which govern establishment, termination and other essential elements of a call. SIP can be used for creating, modifying and terminating sessions consisting of one or several media streams. SIP can be used for two-party (unicast) or multiparty (multicast) sessions. Other SIP applications include video conferencing, streaming multimedia distribution, instant messaging, presence information, file transfer, fax over IP and online games.
(Source: Wikipedia)
Building Enterprise Level End-To-End Monitor System with Open Source Solution...LivePerson
Recently, LivePerson's Production moved from traditional monitoring to a new enterprise monitoring system using only open source tools.
Oren Katz (Production Monitoring Team Leader) and Ittiel Savir (Automation team leader) will describe the road from a concept to the implementation in LivePerson,
In the lecture we will talk about chosen tools, the development process, tips, and how to avoid pitfalls
Check out Oren's recent blog post on the Subject: http://bit.ly/16i5lDS
Ofer Ron, senior data scientist at LivePerson.
Recently, I've had the pleasure of presenting an introduction to Data Science and data driven products at DevconTLV
I focused this talk around the basic ideas of data science, not the technology used, since I thought that far too many times companies and developers rush to play around with "big data" related technologies, instead of figuring out what questions they want to answer, and whether these answers form a successful product.
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
Ran Silberman, developer & technical leader at LivePerson presents how LivePerson moved their data platform from a legacy ETL concept to new "Data Integration" concept of our era.
Kafka is the main infrastructure that holds the backbone for data flow in the new Data Integration. Having that said, Kafka cannot come by itself. Other supporting systems like Hadoop, Storm, and Avro protocol were also integrated.
In this lecture Ran will describe the implementation in LivePerson and will share some tips and how to avoid pitfalls.
Read More: https://connect.liveperson.com/community/developers/blog/2013/11/21/from-a-kafkaesque-story-to-the-promised-land
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
2. Outline
Scalding - Scala library that makes it easy
to write MapReduce jobs in Hadoop.
We will talk about:
• MapReduce paradigm
• Writing Scalding jobs
• Improving jobs performance
• Typed API, testing
3. Getting a glimpse of some Scalding code
class TopKJob(args : Args) extends Job(args){
val exclusions = Tsv(args("exclusions"), 'exVisitorId)
Tsv(args("input"), visitScheme)
.filter('country){country : String => country == "Israel"}
.leftJoinWithTiny('visitorId -> exVisitorId, exclusions)
.filter('exVisitorId){isEx : String => isEx
null}
.groupBy('section){_.sortWithTake(visitScheme -> 'top,
)(biggerSale)}
.flattenTo[visitType]{'top -> visitScheme}
.write(Tsv(args("output"), visitScheme))
}
4. Asking big data questions
Which questions will you ask?
What analysis will you do?
A possible approach:
Use the outliers to improve your product
• Most popular products on your site
• Visits that ended with the highest sale value
5. Asking big data questions
Which questions will you ask?
What analysis will you do?
A possible approach:
Use the outliers to improve your product
• Most popular products on your site
• Visits that ended with the highest sale value
That is the problem of finding the top elements in the data.
6. Data analysis problem
Top elements problem
Input
•
Data – arranged in records
•
K – number of top elements or p – percentage of top
elements to output
•
Order function – some ordering on the records
Output
•
K top records of our data or top p percentage according to
the order function
7. Algorithm flow
Read input records
Top K elements
problem
Input =
13, 55, 8, 2, 34, 89, 21, 8
K=5
Sort records, take top K
Output top records
Output =
89, 55, 34, 21, 13
8. Algorithm flow
Read input records
Top K elements
problem
Input =
13, 55, 8, 2, 34, 89, 21, 8
K=5
Sort records, take top K
Output top records
Output =
89, 55, 34, 21, 13
Scalding code
Tsv(args("input"), 'item)
.groupAll{_.sortWithTake('item -> 'top,
(a : Int, b : Int) => a > b}}
.write(Tsv(args("output"), 'top))
){
10. Algorithm flow
Read input records
Top K elements
problem
Filter records that fit
target population
Sort records, take top K
Output top records
11. Algorithm flow
Read input records
Top K elements
problem
Filter records that fit
target population
Divide to groups by site
section
Sort
records, tak
e top K
Sort
records, tak
e top K
Output top
records
Output top
records
12. Algorithm flow
Read input records
Top K elements
problem
Read exclusion list from
external source
Filter records that fit
target population
Filter out the visits from
the exclusion list
according to visitor id
Divide to groups by site
section
Sort
records,
take top K
Sort
records,
take top K
Output top
records
Output top
records
14. MapReduce on Hadoop
Big bottleneck
Block
Mapper
(k,v)
(k’1,v’1),(k’2,v’2)…
HDFS
Block n
Mapper n
(k,v)
(k’1,v’1),(k’2,v’2)…
Output
file
Reducer
(k', iterator(v'))
v’’1, v’’2…
Block
Mapper
(k,v)
(k’1,v’1),(k’2,v’2)…
Reducer
(k', iterator(v'))
v’’1, v’’2…
Output
file
15. Efficient MapReduce
Which tool
should we
use?
Have built-in
performanceoriginated features
Efficient
Execution
Easy to alter
And easy
maintenance
Full
Functionality
Fast
Code Writing
16. About Scalding
Scalding is a Scala library that makes it easy to write
MapReduce jobs in Hadoop. It's similar to other
MapReduce platforms like Pig and Hive, but offers a
higher level of abstraction by leveraging the full power of
Scala and the JVM
–Twitter
17. Algorithm flow
Read input records
Top K elements
problem
Read exclusion list from
external source
Filter records that fit
target population
Filter out the visits from
the exclusion list
according to visitor id
Divide to groups by site
section
Sort
records,
take top K
Sort
records,
take top K
Output top
records
Output top
records
24. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
…
exVisitorId
3
1
25. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
exVisitorId
3
…
visitorId
country
section
saleValue
1
exVisitorId
26. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
exVisitorId
3
…
1
visitorId
1
country
Israel
section
…
saleValue
…
exVisitorId
1
2
3
Israel
Israel
…
…
…
…
null
3
27. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
exVisitorId
3
…
1
visitorId
1
country
Israel
section
…
saleValue
…
exVisitorId
1
2
3
Israel
Israel
…
…
…
…
null
3
visitorId
country
section
saleValue
exVisitorId
2
Israel
…
…
null
33. Efficient MapReduce
MapReduce performance issues:
1. Traffic bottleneck between the mappers and the reduces.
The traffic bottleneck is when we take the top K elements.
•
•
We like to output from each mapper the top elements of its
input.
How is sortWithTake implemented?
34. Efficient performance using Algebird
sortWithTake uses:
class PriorityQueueMonoid[T](max : Int)(implicit
ord : Ordering[T]) extends Monoid[PriorityQueue[T]]
Defined in:
Algebird (Twitter): Abstract algebra for Scala, targeted at
building aggregation systems.
35. Efficient performance using Algebird
sortWithTake uses:
class PriorityQueueMonoid[T](max : Int)(implicit
ord : Ordering[T]) extends Monoid[PriorityQueue[T]]
PriorityQueue case:
Empty PriorityQueue
Two PriorityQueues can be added:
K=5
Q1: values = 55, 34, 21, 13, 8
Q2: values = 100, 80, 60, 40, 20
Q1 plus Q2: values: 100, 80, 60, 55, 40
Associative and commutative
36. Efficient performance using Algebird
All Monoid aggregations can start in Map phase, then
finish in Reduce phase. This decreases the amount
of traffic from the mappers to the reducers.
Performed implicitly when using Scalding built-in
aggregation functions:
average
sum
sizeAveStdev
histogram
approximateUniqueCount
sortWithTake
37. Improving performance
Our second performance issue:
What about the performance due to
inefficient order of the map and reduce
steps?
38. Top elements problem revisited
New problem definition:
Output the percentage p of top elements
instead of the fixed K top elements.
What is K?
K = p * count
39. Top %p of elements algorithm flow
Read input records
What is K?
K = p * count
…
Divide to groups by site
section
Count the
number of
records
Count the
number of
records
Sort
records
take top p
Sort
records
take top p
Output top
records
Output top
records
40. Top %p of elements scalding job
class TopPJob(args : Args) extends Job(args){
// visitScheme after join with exclusion list
val visits : RichPipe = …
val counts = visits
.groupBy('section){_.size('sectionSize)}
.map('sectionSize -> 'sectionK){size : Int => {size *
// taking top %p of elements
visits.joinWithTiny('section -> 'section, counts)
…
}
}.toInt}
41. Flow graph
How will this flow be executed on Hadoop?
•
How many MapReduce steps will be performed?
•
What will be the input to each step?
•
What logic will each contain?
42. Flow graph
How will this flow be executed on Hadoop?
•
How many MapReduce steps will be performed?
•
What will be the input to each step?
•
What logic will each contain?
Run with --tool.graph!
44. Flow graph
Split to
counting
Full flow in
Cascading
terminology
Reading input,
join with
exclusion list
Counting and
calculating K
Join with
counting
result
Joining with K
and sorting
46. Flow graph
And another graph:
source
source
Step number
Records input
Exclusion list
group
Step number
Records input
Exclusion list
group
Output file
sink
First
step
Second
step
47. Flow graph
Changing joining with exclusion list to
be performed only once:
val visits : RichPipe =
…
.project(visitScheme)
.forceToDisk
Only a single
line is added!
val counts = visits
.groupBy('section){_.size('sectionSize)}
…
visits.joinWithTiny('section -> 'section, counts)
…
48. Flow graph
The new map reduce steps:
source
Step number
Records input
Exclusion list
Step number
group
sink
Step number
group
Output file
First
step
Second
step
Third
step
49. Improving performance
We saw how:
• Writing Scalding jobs is simple, intuitive and fast.
• We can use external resources to improve the
performance of our algorithms. Scalding performs
some of this job implicitly for us.
• We can use Cascading library Scalding built on to
understand what are the exact steps that will run.
50. Additional features
Some other features in Scalding
• Typed API
TypedTsv[visitType](args("input"))
.filter(_._2 == "Israel")
.toPipe(visitScheme)
.toTypedPipe[visitType](visitScheme)
// TypedPipe[visitType]
// TypedPipe[visitType]
• Testing using JobTest
Give the input and get the output as Lists
• Matrix API
Useful for running graph algorithms such as PageRank
51. Scalding in LivePerson
How do we use
Scalding in LivePerson?
• The main tool in the Data Science team
• Both for quick data exploration, and in production jobs