My name is Neta Barkay , and I'm a data scientist at LivePerson.
I'd like to share with you a talk I presented at the Underscore Scala community on "Efficient MapReduce using Scalding".
In this talk I reviewed why Scalding fits big data analysis, how it enables writing quick and intuitive code with the full functionality vanilla MapReduce has, without compromising on efficient execution on the Hadoop cluster. In addition, I presented some examples of Scalding jobs which can be used to get you started, and talked about how you can use Scalding's ecosystem, which includes Cascading and the monoids from Algebird library.
Read more & Video: https://connect.liveperson.com/community/developers/blog/2014/02/25/scalding-reaching-efficient-mapreduce
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
Twitter Scalding is built on top of Cascading, which is built on top of Hadoop. It's basically a very nice to read and extend DSL for writing map reduce jobs.
This is an quick introduction to Scalding and Monoids. Scalding is a Scala library that makes writing MapReduce jobs very easy. Monoids on the other hand promise parallelism and quality and they make some more challenging algorithms look very easy.
The talk was held at the Helsinki Data Science meetup on January 9th 2014.
In the past year there has been a tremendous amount of activity on Scala APIs for Hadoop. In this talk we`ll talk about writing Map/Reduce jobs in a more functional manner and explore the three most popular Scala packages for Hadoop: Scalding, Scoobi and Scrunch. Detailed usage examples will be provided for each along with some real world use cases.
This presentation is about Scalding with focus on the programming model compared to Hadoop and Cascading. I did this presentation for the group http://www.meetup.com/riviera-scala-clojure
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
Twitter Scalding is built on top of Cascading, which is built on top of Hadoop. It's basically a very nice to read and extend DSL for writing map reduce jobs.
This is an quick introduction to Scalding and Monoids. Scalding is a Scala library that makes writing MapReduce jobs very easy. Monoids on the other hand promise parallelism and quality and they make some more challenging algorithms look very easy.
The talk was held at the Helsinki Data Science meetup on January 9th 2014.
In the past year there has been a tremendous amount of activity on Scala APIs for Hadoop. In this talk we`ll talk about writing Map/Reduce jobs in a more functional manner and explore the three most popular Scala packages for Hadoop: Scalding, Scoobi and Scrunch. Detailed usage examples will be provided for each along with some real world use cases.
This presentation is about Scalding with focus on the programming model compared to Hadoop and Cascading. I did this presentation for the group http://www.meetup.com/riviera-scala-clojure
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Using Cerberus and PySpark to validate semi-structured datasetsBartosz Konieczny
This short presentation shows one of ways to to integrate Cerberus and PySpark. It was initially given at Paris.py meetup (https://www.meetup.com/Paris-py-Python-Django-friends/events/264404036/)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
Jayesh Thakrar, Senior Systems Engineer, Conversant
The venerable HBase shell is often regarded as a simple utility to perform basic DDL and maintenance activities. However, it is in fact a powerful, interactive programming environment, primarily due to the JRuby engine under the covers. In this presentation, I'll describe its JRuby heritage and show some of the things that can be done with the "ird" (interactive ruby shell), as well as show how to exploit JRuby and Java integration via concrete working examples. In addition, I will demonstrate how the "shell" can be used in Hadoop streaming to quickly perform complex and large volume batch jobs.
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseSages
Szybkie wprowadzenie do technologii Pig i Hive z ekosystemu Hadoop. Prezentacja wykonana w ramach warsztatów Codepot w dniu 29.08.2015. Prezentacja wykonana przez Radosława Stankiewicza oraz Bartłomieja Tartanusa.
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Presentazione dell'evento EsInRome del 7 Febbraio 2017 - Integrazione Elasticsearch in architettura BigData e facilità di integrazione con Apache Spark.
Making Structured Streaming Ready for ProductionDatabricks
In mid-2016, we introduced Structured Steaming, a new stream processing engine built on Spark SQL that revolutionized how developers can write stream processing application without having to reason about having to reason about streaming. It allows the user to express their streaming computations the same way you would express a batch computation on static data. The Spark SQL engine takes care of running it incrementally and continuously updating the final result as streaming data continues to arrive. It truly unifies batch, streaming and interactive processing in the same Datasets/DataFrames API and the same optimized Spark SQL processing engine.
The initial alpha release of Structured Streaming in Apache Spark 2.0 introduced the basic aggregation APIs and files as streaming source and sink. Since then, we have put in a lot of work to make it ready for production use. In this talk, Tathagata Das will cover in more detail about the major features we have added, the recipes for using them in production, and the exciting new features we have plans for in future releases. Some of these features are as follows:
- Design and use of the Kafka Source
- Support for watermarks and event-time processing
- Support for more operations and output modes
Speaker: Tathagata Das
This talk was originally presented at Spark Summit East 2017.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Using Cerberus and PySpark to validate semi-structured datasetsBartosz Konieczny
This short presentation shows one of ways to to integrate Cerberus and PySpark. It was initially given at Paris.py meetup (https://www.meetup.com/Paris-py-Python-Django-friends/events/264404036/)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
Jayesh Thakrar, Senior Systems Engineer, Conversant
The venerable HBase shell is often regarded as a simple utility to perform basic DDL and maintenance activities. However, it is in fact a powerful, interactive programming environment, primarily due to the JRuby engine under the covers. In this presentation, I'll describe its JRuby heritage and show some of the things that can be done with the "ird" (interactive ruby shell), as well as show how to exploit JRuby and Java integration via concrete working examples. In addition, I will demonstrate how the "shell" can be used in Hadoop streaming to quickly perform complex and large volume batch jobs.
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseSages
Szybkie wprowadzenie do technologii Pig i Hive z ekosystemu Hadoop. Prezentacja wykonana w ramach warsztatów Codepot w dniu 29.08.2015. Prezentacja wykonana przez Radosława Stankiewicza oraz Bartłomieja Tartanusa.
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Presentazione dell'evento EsInRome del 7 Febbraio 2017 - Integrazione Elasticsearch in architettura BigData e facilità di integrazione con Apache Spark.
Making Structured Streaming Ready for ProductionDatabricks
In mid-2016, we introduced Structured Steaming, a new stream processing engine built on Spark SQL that revolutionized how developers can write stream processing application without having to reason about having to reason about streaming. It allows the user to express their streaming computations the same way you would express a batch computation on static data. The Spark SQL engine takes care of running it incrementally and continuously updating the final result as streaming data continues to arrive. It truly unifies batch, streaming and interactive processing in the same Datasets/DataFrames API and the same optimized Spark SQL processing engine.
The initial alpha release of Structured Streaming in Apache Spark 2.0 introduced the basic aggregation APIs and files as streaming source and sink. Since then, we have put in a lot of work to make it ready for production use. In this talk, Tathagata Das will cover in more detail about the major features we have added, the recipes for using them in production, and the exciting new features we have plans for in future releases. Some of these features are as follows:
- Design and use of the Kafka Source
- Support for watermarks and event-time processing
- Support for more operations and output modes
Speaker: Tathagata Das
This talk was originally presented at Spark Summit East 2017.
Ekonomisten Konpetentzia Profesionalak detektatzeko azterlana enplegatzailearen ikuspuntutik.
Ezinbesteko tresna da kudeaketa, antolakuntza, unibertsitate eta prestakuntza zentroen arloan jarduten dutenentzat, eta azken finean, pertsonen garapenean lan egiten dutenentzat.
How to Run Successful Adwords Campaigns for Multi-Location BusinessesPowered by Search
Matthew Hunt (COO of Powered by Search) explains how to run a successful Adwords Campaign. This was delivered in the Google Office in Toronto and aims to give you actionable advice you can take to your business today.
CS442 - Rogue: A Scala DSL for MongoDBjorgeortiz85
Talk at Stanford's CS442 (High Productivity and Performance with Domain Specific Languages in Scala http://www.stanford.edu/class/cs442/), on Rogue. 5/24/2011
Patterns for slick database applicationsSkills Matter
Slick is Typesafe's open source database access library for Scala. It features a collection-style API, compact syntax, type-safe, compositional queries and explicit execution control. Community feedback helped us to identify common problems developers are facing when writing Slick applications. This talk suggests particular solutions to these problems. We will be looking at reducing boiler-plate, re-using code between queries, efficiently modeling object references and more.
En vieux bourlingueur du langage Swift, Grégoire Lhotellier viendra nous présenter les séquences et les collections du nouveau langage d’Apple. Il nous briefera sur l’essentiel de ce qu’il faut en savoir et ce qu’elles changent par rapport à leurs équivalent Objective-C.
Programiści aplikacji Mobilnych na Androida, uwięzieni w czasach Java 1.7 od pewnego czasu eksperymentowali z innymi językami programowania. Żaden nie zdobył do tej pory takiej popularności jak Kotlin. Ale czy faktycznie jest to coś rewolucyjnego? Przecież getery, settery i konstruktory wygenerujemy za pomocą Lomboka. Używając Retrolamby zyskamy wsparcie dla dopełnień. A dodatkowo od niedawna Android ma wsparcie dla Javy 8.
Zatem co decyduje o sile Kotlina, które konstrukcje i właściwości języka powodują, że warto zastosować go w swoim projekcie? Jaki wpływ będzie to miało na architekturę aplikacji i wydajność? Kotlin jest tylko ciekawostką czy spowoduje, że będziesz kodował efektywniej? Z tej prezentacji wyniesiesz pełen zestaw informacji pozwalający odpowiedzieć na wszystkie te pytania.
After migrating a three year old C# project to Java we ending up with a significant portion of legacy code using lambdas in Java. What was some of the good use cases, code which could be written better and the problems we had migrating from C#. At the end we look at the performance implications of using Lambdas.
Type safe embedded domain-specific languagesArthur Xavier
Language is everything; it governs our lives: from our thought processes, our communication abilities and our understanding of the world, all the way up to law, politics, logic and programming. All of these domains of human experience are governed by different languages that talk to each other, and so should be your code. Haskell provides all the means necessary—and many more—to easily and safely use embedded small languages that are tailored to specific needs and business domains.
In this series of lectures and workshops, we will explore the whats, whys and hows of embedded domain-specific languages in Haskell, and how language oriented programing can bring type-safety, composability and simplicity to the development of complex applications.
Kubernetes your tests! automation with docker on google cloud platformLivePerson
Arik Lerner, Automation Team Leader, and Waseem Hamshawi, Automation Infra Developer, present how to build a large scale automated testing platform by leveraging containers orchestration over GCP, with the ability to scale out and provide fast feedback while maintaining a highly reliable test infrastructure.
The presentation includes new approach of managing a scalable testing platform of distributed automated tests with Kubernetes and Docker over Google Cloud Platform.
Topics:
• GCP and Kubernetes introduction for automated testing
• Traditional Selenium Grid vs Selenium Standalone with Kubernetes and Docker for Web and Mobile tests
• Distributed and containerized testing environment over container cluster - different use cases
Ephemerals - "Short-lived Testing Endpoints". An Open Source by LivePerson which makes automation testing at large scale like a "Walk in the park".
In this Meetup Yaar Reuveni – Team Leader & Nir Hedvat – Software Engineer from Liveperson Data Platform R&D team, will talk about the journey we made from early days of the data platform in production with high friction and low awareness to issues into a mature, measurable data platform that is visible and trustworthy.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
video: https://www.youtube.com/watch?v=IBC9gcYqNR4
In this talk Efim Dimenstein, Chief Architect at Liveperson will cover the rules and guidelines of building resilient systems, implementing them in real life and lessons learned during the process. The talk will focus on achieving resilience in real life and will feature a lot of examples and lessons learned from building systems currently in production running at extreme scale.
Efim will talk about:
· General resilience guidelines
· How they are implemented in practice
· What changes needed to be implemented to achieve
resilience
· Lessons learned
· Summary
My name is Victor Perepelitsky I'm an R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team.
In this Meetup I talked about the journey of creating the platform from scratch - challenges, design decisions, technology choices and more.
During the last 3 years the team has built Real Time Event Processing Platform which is currently running in production with thousands of new and migrated customers. It is built to handle hundreds of thousands requests per/sec with low latency response time (under 30 ms round trip)
I went through different topics and stages of this journey and share details that led to specific choices and results.
“Stateful or Stateless”, “CEP”, “Rules engine”, “Automated performance testing”, “Locking”, “Timing” were a part of the menu.
In this meetup, Kobi Salant - Data Platform Technical Lead & Vladi Feigin - Data System Architect, both from Liveperson will talk about : Making scale a non-issue for real-time Data apps.
Have you ever tried to build a system processing in real-time hundreds of thousands events per second and servicing more than 1M concurrent visitors?
We're going to talk about the LivePerson real-time stream processing solution doing exactly that. Learn how we empower digital call centers with insights for their critical decision making processes and never-ending efficiency goals.
In this talk Sergei Koren, Production Architect at LivePerson will present HTTP/2, the official successor of HTTP 1.1, and how it would influence Web as we know it.
Sergei will talk about:
- HTTP/2 history
- The major changes - what do and don’t
- Expected changes to Web as we use it today
- Proposed checklist for implementation: how and when; from Production point of view.
Mobile app real-time content modifications using websocketsLivePerson
We are happy to host Benny Weingarten-Gabbay, Senior Software Engineer at eBay at our offices.
Benny presents BetterContent, a tool that allows editing of an iOS mobile app in runtime, in a fun and easy way.
Read more on our DevBlog:
https://connect.liveperson.com/community/developers/blog/2015/03/26/mobile-app-real-time-content-modifications-using-websockets
Mobile SDK: Considerations & Best Practices LivePerson
Mobile SDKs are a great way to make your service or API easily consumable by the large number of developers out there looking for state of the art tools to make their apps stand out in the competitive marketplaces, but building a stable, compatible and successful SDK is quite a challenge.
In this talk we the technical and design challenges involved in developing an efficient mobile SDK that is highly compatible with its host mobile app, and the various considerations we took into account and the lessons we’ve learned while designing and building LivePerson’s native mobile SDK.
In this Meetup Victor Perepelitsky - R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team , will talk about Java 8', 'Stream API', 'Lambda', and 'Method reference'.
Victor will clarify what functional programming is and how can you use java 8 in order to create better software.
Victor will also cover some pain points that Java 8 did not solve regarding functionality and see how you can work around it.
If you are building a service oriented system and you want to build it for scale as well as flexibility. There are a few questions you need to make sure are asked and answered regarding the data interchange between services and offline persistency of services data. Questions as:
- How can I change a service API without breaking other services?
- How do I keep data from services consistent over time?
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency.
It includes a dive into the Apache Avro technology and how we used it.
Also what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Apache Avro and Messaging at Scale in LivePersonLivePerson
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.
In this lecture, Sergei Koren, System architect at LivePerson production team presents data & image compression and its effective usage in modern web and data flows.
Support Office Hour Webinar - LivePerson API LivePerson
Course description and agenda
LivePerson enables the creation of innovative applications designed to enhance and extend the functionality of your LivePerson solution, as well as cooperate with partners worldwide.
In this session we will demonstrate the LivePerson API offerings, the development process and quick overview of CHAT API and its basic usage. You will also have an opportunity to ask questions relevant to your business.
Host: Nitay Bartal
Date: July 17, 2014
Time: 11:00 AM - 12:00 PM EST
Duration: 60 minutes
Agenda:
- Leveraging LivePerson APIs to your benefit
- Overview of LivePerson API offerings
- Introduction to LivePerson Developers Network
- Overview of the Development process
- Tools and best practices
- Helpful tips and tricks
- Q&A
SIP - More than meets the eye
Speakers:
Ofer Cohen - VOIP Group Leader, LivePerson
Yossi Maimon - VOIP Technical Leader, LivePerson
An Introduction to the SIP protocol.
SIP Position in telecommunication networks and the content services.
What is SIP:
The Session Initiation Protocol (SIP) is a signaling communications protocol, widely used for controlling multimedia communication sessions such as voice and video calls over Internet Protocol (IP) networks.
The protocol defines the messages that are sent between peers which govern establishment, termination and other essential elements of a call. SIP can be used for creating, modifying and terminating sessions consisting of one or several media streams. SIP can be used for two-party (unicast) or multiparty (multicast) sessions. Other SIP applications include video conferencing, streaming multimedia distribution, instant messaging, presence information, file transfer, fax over IP and online games.
(Source: Wikipedia)
Building Enterprise Level End-To-End Monitor System with Open Source Solution...LivePerson
Recently, LivePerson's Production moved from traditional monitoring to a new enterprise monitoring system using only open source tools.
Oren Katz (Production Monitoring Team Leader) and Ittiel Savir (Automation team leader) will describe the road from a concept to the implementation in LivePerson,
In the lecture we will talk about chosen tools, the development process, tips, and how to avoid pitfalls
Check out Oren's recent blog post on the Subject: http://bit.ly/16i5lDS
Ofer Ron, senior data scientist at LivePerson.
Recently, I've had the pleasure of presenting an introduction to Data Science and data driven products at DevconTLV
I focused this talk around the basic ideas of data science, not the technology used, since I thought that far too many times companies and developers rush to play around with "big data" related technologies, instead of figuring out what questions they want to answer, and whether these answers form a successful product.
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
Ran Silberman, developer & technical leader at LivePerson presents how LivePerson moved their data platform from a legacy ETL concept to new "Data Integration" concept of our era.
Kafka is the main infrastructure that holds the backbone for data flow in the new Data Integration. Having that said, Kafka cannot come by itself. Other supporting systems like Hadoop, Storm, and Avro protocol were also integrated.
In this lecture Ran will describe the implementation in LivePerson and will share some tips and how to avoid pitfalls.
Read More: https://connect.liveperson.com/community/developers/blog/2013/11/21/from-a-kafkaesque-story-to-the-promised-land
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
2. Outline
Scalding - Scala library that makes it easy
to write MapReduce jobs in Hadoop.
We will talk about:
• MapReduce paradigm
• Writing Scalding jobs
• Improving jobs performance
• Typed API, testing
3. Getting a glimpse of some Scalding code
class TopKJob(args : Args) extends Job(args){
val exclusions = Tsv(args("exclusions"), 'exVisitorId)
Tsv(args("input"), visitScheme)
.filter('country){country : String => country == "Israel"}
.leftJoinWithTiny('visitorId -> exVisitorId, exclusions)
.filter('exVisitorId){isEx : String => isEx
null}
.groupBy('section){_.sortWithTake(visitScheme -> 'top,
)(biggerSale)}
.flattenTo[visitType]{'top -> visitScheme}
.write(Tsv(args("output"), visitScheme))
}
4. Asking big data questions
Which questions will you ask?
What analysis will you do?
A possible approach:
Use the outliers to improve your product
• Most popular products on your site
• Visits that ended with the highest sale value
5. Asking big data questions
Which questions will you ask?
What analysis will you do?
A possible approach:
Use the outliers to improve your product
• Most popular products on your site
• Visits that ended with the highest sale value
That is the problem of finding the top elements in the data.
6. Data analysis problem
Top elements problem
Input
•
Data – arranged in records
•
K – number of top elements or p – percentage of top
elements to output
•
Order function – some ordering on the records
Output
•
K top records of our data or top p percentage according to
the order function
7. Algorithm flow
Read input records
Top K elements
problem
Input =
13, 55, 8, 2, 34, 89, 21, 8
K=5
Sort records, take top K
Output top records
Output =
89, 55, 34, 21, 13
8. Algorithm flow
Read input records
Top K elements
problem
Input =
13, 55, 8, 2, 34, 89, 21, 8
K=5
Sort records, take top K
Output top records
Output =
89, 55, 34, 21, 13
Scalding code
Tsv(args("input"), 'item)
.groupAll{_.sortWithTake('item -> 'top,
(a : Int, b : Int) => a > b}}
.write(Tsv(args("output"), 'top))
){
10. Algorithm flow
Read input records
Top K elements
problem
Filter records that fit
target population
Sort records, take top K
Output top records
11. Algorithm flow
Read input records
Top K elements
problem
Filter records that fit
target population
Divide to groups by site
section
Sort
records, tak
e top K
Sort
records, tak
e top K
Output top
records
Output top
records
12. Algorithm flow
Read input records
Top K elements
problem
Read exclusion list from
external source
Filter records that fit
target population
Filter out the visits from
the exclusion list
according to visitor id
Divide to groups by site
section
Sort
records,
take top K
Sort
records,
take top K
Output top
records
Output top
records
14. MapReduce on Hadoop
Big bottleneck
Block
Mapper
(k,v)
(k’1,v’1),(k’2,v’2)…
HDFS
Block n
Mapper n
(k,v)
(k’1,v’1),(k’2,v’2)…
Output
file
Reducer
(k', iterator(v'))
v’’1, v’’2…
Block
Mapper
(k,v)
(k’1,v’1),(k’2,v’2)…
Reducer
(k', iterator(v'))
v’’1, v’’2…
Output
file
15. Efficient MapReduce
Which tool
should we
use?
Have built-in
performanceoriginated features
Efficient
Execution
Easy to alter
And easy
maintenance
Full
Functionality
Fast
Code Writing
16. About Scalding
Scalding is a Scala library that makes it easy to write
MapReduce jobs in Hadoop. It's similar to other
MapReduce platforms like Pig and Hive, but offers a
higher level of abstraction by leveraging the full power of
Scala and the JVM
–Twitter
17. Algorithm flow
Read input records
Top K elements
problem
Read exclusion list from
external source
Filter records that fit
target population
Filter out the visits from
the exclusion list
according to visitor id
Divide to groups by site
section
Sort
records,
take top K
Sort
records,
take top K
Output top
records
Output top
records
24. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
…
exVisitorId
3
1
25. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
exVisitorId
3
…
visitorId
country
section
saleValue
1
exVisitorId
26. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
exVisitorId
3
…
1
visitorId
1
country
Israel
section
…
saleValue
…
exVisitorId
1
2
3
Israel
Israel
…
…
…
…
null
3
27. MapReduce joins
We like to filter out the visits that appear in
the exclusion list:
visitorId
1
2
country
Israel
Israel
section
…
…
saleValue
…
…
3
Israel
…
exVisitorId
3
…
1
visitorId
1
country
Israel
section
…
saleValue
…
exVisitorId
1
2
3
Israel
Israel
…
…
…
…
null
3
visitorId
country
section
saleValue
exVisitorId
2
Israel
…
…
null
33. Efficient MapReduce
MapReduce performance issues:
1. Traffic bottleneck between the mappers and the reduces.
The traffic bottleneck is when we take the top K elements.
•
•
We like to output from each mapper the top elements of its
input.
How is sortWithTake implemented?
34. Efficient performance using Algebird
sortWithTake uses:
class PriorityQueueMonoid[T](max : Int)(implicit
ord : Ordering[T]) extends Monoid[PriorityQueue[T]]
Defined in:
Algebird (Twitter): Abstract algebra for Scala, targeted at
building aggregation systems.
35. Efficient performance using Algebird
sortWithTake uses:
class PriorityQueueMonoid[T](max : Int)(implicit
ord : Ordering[T]) extends Monoid[PriorityQueue[T]]
PriorityQueue case:
Empty PriorityQueue
Two PriorityQueues can be added:
K=5
Q1: values = 55, 34, 21, 13, 8
Q2: values = 100, 80, 60, 40, 20
Q1 plus Q2: values: 100, 80, 60, 55, 40
Associative and commutative
36. Efficient performance using Algebird
All Monoid aggregations can start in Map phase, then
finish in Reduce phase. This decreases the amount
of traffic from the mappers to the reducers.
Performed implicitly when using Scalding built-in
aggregation functions:
average
sum
sizeAveStdev
histogram
approximateUniqueCount
sortWithTake
37. Improving performance
Our second performance issue:
What about the performance due to
inefficient order of the map and reduce
steps?
38. Top elements problem revisited
New problem definition:
Output the percentage p of top elements
instead of the fixed K top elements.
What is K?
K = p * count
39. Top %p of elements algorithm flow
Read input records
What is K?
K = p * count
…
Divide to groups by site
section
Count the
number of
records
Count the
number of
records
Sort
records
take top p
Sort
records
take top p
Output top
records
Output top
records
40. Top %p of elements scalding job
class TopPJob(args : Args) extends Job(args){
// visitScheme after join with exclusion list
val visits : RichPipe = …
val counts = visits
.groupBy('section){_.size('sectionSize)}
.map('sectionSize -> 'sectionK){size : Int => {size *
// taking top %p of elements
visits.joinWithTiny('section -> 'section, counts)
…
}
}.toInt}
41. Flow graph
How will this flow be executed on Hadoop?
•
How many MapReduce steps will be performed?
•
What will be the input to each step?
•
What logic will each contain?
42. Flow graph
How will this flow be executed on Hadoop?
•
How many MapReduce steps will be performed?
•
What will be the input to each step?
•
What logic will each contain?
Run with --tool.graph!
44. Flow graph
Split to
counting
Full flow in
Cascading
terminology
Reading input,
join with
exclusion list
Counting and
calculating K
Join with
counting
result
Joining with K
and sorting
46. Flow graph
And another graph:
source
source
Step number
Records input
Exclusion list
group
Step number
Records input
Exclusion list
group
Output file
sink
First
step
Second
step
47. Flow graph
Changing joining with exclusion list to
be performed only once:
val visits : RichPipe =
…
.project(visitScheme)
.forceToDisk
Only a single
line is added!
val counts = visits
.groupBy('section){_.size('sectionSize)}
…
visits.joinWithTiny('section -> 'section, counts)
…
48. Flow graph
The new map reduce steps:
source
Step number
Records input
Exclusion list
Step number
group
sink
Step number
group
Output file
First
step
Second
step
Third
step
49. Improving performance
We saw how:
• Writing Scalding jobs is simple, intuitive and fast.
• We can use external resources to improve the
performance of our algorithms. Scalding performs
some of this job implicitly for us.
• We can use Cascading library Scalding built on to
understand what are the exact steps that will run.
50. Additional features
Some other features in Scalding
• Typed API
TypedTsv[visitType](args("input"))
.filter(_._2 == "Israel")
.toPipe(visitScheme)
.toTypedPipe[visitType](visitScheme)
// TypedPipe[visitType]
// TypedPipe[visitType]
• Testing using JobTest
Give the input and get the output as Lists
• Matrix API
Useful for running graph algorithms such as PageRank
51. Scalding in LivePerson
How do we use
Scalding in LivePerson?
• The main tool in the Data Science team
• Both for quick data exploration, and in production jobs