In this meetup, Kobi Salant - Data Platform Technical Lead & Vladi Feigin - Data System Architect, both from Liveperson will talk about : Making scale a non-issue for real-time Data apps.
Have you ever tried to build a system processing in real-time hundreds of thousands events per second and servicing more than 1M concurrent visitors?
We're going to talk about the LivePerson real-time stream processing solution doing exactly that. Learn how we empower digital call centers with insights for their critical decision making processes and never-ending efficiency goals.
Introduction to Multimodal Language models with LLaVA. What are Multimodal models, how do they work, the LLaVA papers/models, and Image classification experiment.
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Fordham University
In recent years, the field of artificial intelligence (AI) has witnessed remarkable advancements, particularly in the domain of Generative models. Generative AI, a subset of machine learning, focuses on developing systems that can create novel and realistic content, ranging from text, speech, images to the multimodal content. This burgeoning field has demonstrated unprecedented potential to revolutionize various industries, making it imperative to introduce dedicated study materials on the foundation of Generative AI. With the increasing integration of Generative AI in various industries, professionals with expertise in this field are in high demand, and thus we believe that the publication of the slides are extremely important to meet the current need. The proposed outline aims to equip students with the knowledge and skills required to harness the creative power of AI and navigate the ethical implications associated with Generative technologies. * Materials used in this PPT were collected from Wikipedia, Google Image, and OpenAI GPT. No copyright is claimed by the author.
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
Could GPT-3 be the most powerful artificial intelligence ever developed? When OpenAI, a research business co-founded by Elson Musk, released the tool recently, it created a massive amount of hype. Here we look through the hype and outline what it is and what it isn’t.
Introduction to Multimodal Language models with LLaVA. What are Multimodal models, how do they work, the LLaVA papers/models, and Image classification experiment.
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Fordham University
In recent years, the field of artificial intelligence (AI) has witnessed remarkable advancements, particularly in the domain of Generative models. Generative AI, a subset of machine learning, focuses on developing systems that can create novel and realistic content, ranging from text, speech, images to the multimodal content. This burgeoning field has demonstrated unprecedented potential to revolutionize various industries, making it imperative to introduce dedicated study materials on the foundation of Generative AI. With the increasing integration of Generative AI in various industries, professionals with expertise in this field are in high demand, and thus we believe that the publication of the slides are extremely important to meet the current need. The proposed outline aims to equip students with the knowledge and skills required to harness the creative power of AI and navigate the ethical implications associated with Generative technologies. * Materials used in this PPT were collected from Wikipedia, Google Image, and OpenAI GPT. No copyright is claimed by the author.
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
Could GPT-3 be the most powerful artificial intelligence ever developed? When OpenAI, a research business co-founded by Elson Musk, released the tool recently, it created a massive amount of hype. Here we look through the hype and outline what it is and what it isn’t.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract: This workship introduces basic concepts of Bayes Theorem. Concepts covered are difference between independent and conditional probabilities, Bayes formulaes and examples.
Level: Fundamental
Requirements: No prior programming or statistics knowledge is required.
The Text Classification slides contains the research results about the possible natural language processing algorithms. Specifically, it contains the brief overview of the natural language processing steps, the common algorithms used to transform words into meaningful vectors/data, and the algorithms used to learn and classify the data.
To learn more about RAX Automation Suite, visit: www.raxsuite.com
Tokenization:
→ It will segment an input character sequence into tokens.
→ Tokens can be words, numbers, punctuations, and etc.
Tokenizer Types
→ Whitespace Tokenizer - Non whitespace sequences are identified as tokens
→ Simple Tokenizer - A character class tokenizer, sequences of the same character class are tokens
→ Learnable Tokenizer - A maximum entropy tokenizer, detects token boundaries based on probability model
The Ultimate Guide to Implementing Conversational AICeline Rayner
What exactly is conversational AI? How is it different than chatbots? How does it work, and why should you implement it?
In the most comprehensive guide ever written on this topic, we cover every single facet of successful, pain-free conversational AI implementation and maintenance in 2021.
Presentatie van Hans Korbee van Agentschap NL tijdens
Netwerkbijeenkomst 23 mei 2012
Platform binnenluchtkwaliteit basisscholen in Gelderland
Provinciehuis, Noordgalerij, Arnhem 9:00-14:00 uur
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract: This workship introduces basic concepts of Bayes Theorem. Concepts covered are difference between independent and conditional probabilities, Bayes formulaes and examples.
Level: Fundamental
Requirements: No prior programming or statistics knowledge is required.
The Text Classification slides contains the research results about the possible natural language processing algorithms. Specifically, it contains the brief overview of the natural language processing steps, the common algorithms used to transform words into meaningful vectors/data, and the algorithms used to learn and classify the data.
To learn more about RAX Automation Suite, visit: www.raxsuite.com
Tokenization:
→ It will segment an input character sequence into tokens.
→ Tokens can be words, numbers, punctuations, and etc.
Tokenizer Types
→ Whitespace Tokenizer - Non whitespace sequences are identified as tokens
→ Simple Tokenizer - A character class tokenizer, sequences of the same character class are tokens
→ Learnable Tokenizer - A maximum entropy tokenizer, detects token boundaries based on probability model
The Ultimate Guide to Implementing Conversational AICeline Rayner
What exactly is conversational AI? How is it different than chatbots? How does it work, and why should you implement it?
In the most comprehensive guide ever written on this topic, we cover every single facet of successful, pain-free conversational AI implementation and maintenance in 2021.
Presentatie van Hans Korbee van Agentschap NL tijdens
Netwerkbijeenkomst 23 mei 2012
Platform binnenluchtkwaliteit basisscholen in Gelderland
Provinciehuis, Noordgalerij, Arnhem 9:00-14:00 uur
Given the current economic conditions, many businesses are struggling and may need to take action to not only remain profitable but to remain sustainable. Some organizations may be considering a reduction in force. When exploring the option of a reduction in force, it is important that corporate counsel is involved. Corporate counsel will be able to advise on the legal implications of the reduction, to protect the interests of both the employer and the employees. The following ten points are designed to facilitate the discussion with your legal department when having a reduction in force conversation.
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
MQ, ETL and ESB middleware are often used as integration backbone between legacy applications, modern microservices and cloud services. This introduces several challenges and complexities like point-to-point integration or non-scalable architectures. This session discusses how to build a completely event-driven streaming platform leveraging Apache Kafka’s open source messaging, integration and streaming components to leverage distributed processing, fault-tolerance, rolling upgrades and the ability to reprocess events. Learn the differences between a event-driven streaming platform leveraging Apache Kafka and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
Learn the differences between an event-driven streaming platform and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.
This session discusses how teams in different industries solve these challenges by building a native streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
The Netflix Way to deal with Big Data ProblemsMonal Daxini
Netflix is a data driven company with a unique culture. Come take a holistic tour of the Big Data ecosystem, and how Netflix culture catalyzes the development of systems. Then ogle at how we quickly evolved and scaled the event pipeline to a 1 trillion events per day and over 1.4 PB of event data without service disruption, and a small team.
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
Watch this talk here: https://www.confluent.io/online-talks/using-apache-kafka-to-optimize-real-time-analytics-financial-services-iot-applications
When it comes to the fast-paced nature of capital markets and IoT, the ability to analyze data in real time is critical to gaining an edge. It’s not just about the quantity of data you can analyze at once, it’s about the speed, scale, and quality of the data you have at your fingertips.
Modern streaming data technologies like Apache Kafka and the broader Confluent platform can help detect opportunities and threats in real time. They can improve profitability, yield, and performance. Combining Kafka with Panopticon visual analytics provides a powerful foundation for optimizing your operations.
Use cases in capital markets include transaction cost analysis (TCA), risk monitoring, surveillance of trading and trader activity, compliance, and optimizing profitability of electronic trading operations. Use cases in IoT include monitoring manufacturing processes, logistics, and connected vehicle telemetry and geospatial data.
This online talk will include in depth practical demonstrations of how Confluent and Panopticon together support several key applications. You will learn:
-Why Apache Kafka is widely used to improve performance of complex operational systems
-How Confluent and Panopticon open new opportunities to analyze operational data in real time
-How to quickly identify and react immediately to fast-emerging trends, clusters, and anomalies
-How to scale data ingestion and data processing
-Build new analytics dashboards in minutes
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...HostedbyConfluent
Event-driven application architectures are becoming increasingly common as a large number of users demand more interactive, real-time, and intelligent responses. Yet it can be challenging to decide how to capture and perform real-time data analysis and deliver differentiating experiences. Join experts from Confluent and AWS to learn how to build Apache Kafka®-based streaming applications backed by machine learning models. Adopting the recommendations will help you establish repeatable patterns for high performing event-based apps.
Aljoscha Krettek offers a very short introduction to stream processing before diving into writing code and demonstrating the features in Apache Flink that make truly robust stream processing possible, with a focus on correctness and robustness in stream processing.
All of this will be done in the context of a real-time analytics application that we’ll be modifying on the fly based on the topics we’re working though, as Aljoscha exercises Flink’s unique features, demonstrates fault recovery, clearly explains why event time is such an important concept in robust, stateful stream processing, and covers the features you need in a stream processor to do robust, stateful stream processing in production.
We’ll also use a real-time analytics dashboard to visualize the results we’re computing in real time, allowing us to easily see the effects of the code we’re developing as we go along.
Topics include:
* Apache Flink
* Stateful stream processing
* Event time versus processing time
* Fault tolerance
* State management in the face of faults
* Savepoints
* Data reprocessing
http://www.oreilly.com/pub/e/3764
Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.
Streaming Data Ingest and Processing with Apache KafkaAttunity
Apache™ Kafka is a fast, scalable, durable, and fault-tolerant
publish-subscribe messaging system. It offers higher throughput, reliability and replication. To manage growing data volumes, many companies are leveraging Kafka for streaming data ingest and processing.
Join experts from Confluent, the creators of Apache™ Kafka, and the experts at Attunity, a leader in data integration software, for a live webinar where you will learn how to:
-Realize the value of streaming data ingest with Kafka
-Turn databases into live feeds for streaming ingest and processing
-Accelerate data delivery to enable real-time analytics
-Reduce skill and training requirements for data ingest
The recorded webinar on slide 32 includes a demo using automation software (Attunity Replicate) to stream live changes from a database into Kafka and also includes a Q&A with our experts.
For more information, please go to www.attunity.com/kafka.
In this presentation Guido Schmutz talks about Apache Kafka, Kafka Core, Kafka Connect, Kafka Streams, Kafka and "Big Data"/"Fast Data Ecosystems, Confluent Data Platform and Kafka in Architecture.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
Apache Kafka users who want to leverage Google Cloud Platform's (GCPs) data analytics platform and open source hosting capabilities can bridge their existing Kafka infrastructure on-premise or in other clouds to GCP using Confluent's replicator tool and managed Kafka service on GCP. Using actual customer examples and a reference architecture, we'll showcase how existing Kafka users can stream data to GCP and use it in popular tools like Apache Beam on Dataflow, BigQuery, Google Cloud Storage (GCS), Spark on Dataproc, and Tensorflow for data warehousing, data processing, data storage, and advanced analytics using AI and ML.
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...HostedbyConfluent
Indeed is consciously transforming our monolith applications to microservices. Moving monoliths from on-premise to a hybrid architecture is a non-trivial endeavor. It is as we know a marathon and never never a race when we refactor not all of our applications but, incrementally progress onward to resilience with cloud.
By partnering with Confluent we were able to procedurally migrate many of our workloads both critical and non-critical primarily using Kafka by adopting a data domain driven approach. In this talk, you will learn,
1. How to piece complex puzzles when you have bits of information
2. What questions to ask to prioritize feature improvements
3. How to enumerate impact
4. How to let your vendor know what is valuable
With over 20 years of experience working with various databases and datastores, I will share real examples of success and failures and lessons we learned when working with Confluent Cloud by:
- Implementing strategies
- Addressing short and long term value - for both technical and business
- The very methodical methods to form roadmaps
If you’re in discussions surrounding engineering platforms at your organization then this talk is for you. If you are a data driven engineering organization with solid leadership with sound decisions behind it, join us for this talk and let’s have a discussion.
Kubernetes your tests! automation with docker on google cloud platformLivePerson
Arik Lerner, Automation Team Leader, and Waseem Hamshawi, Automation Infra Developer, present how to build a large scale automated testing platform by leveraging containers orchestration over GCP, with the ability to scale out and provide fast feedback while maintaining a highly reliable test infrastructure.
The presentation includes new approach of managing a scalable testing platform of distributed automated tests with Kubernetes and Docker over Google Cloud Platform.
Topics:
• GCP and Kubernetes introduction for automated testing
• Traditional Selenium Grid vs Selenium Standalone with Kubernetes and Docker for Web and Mobile tests
• Distributed and containerized testing environment over container cluster - different use cases
Ephemerals - "Short-lived Testing Endpoints". An Open Source by LivePerson which makes automation testing at large scale like a "Walk in the park".
In this Meetup Yaar Reuveni – Team Leader & Nir Hedvat – Software Engineer from Liveperson Data Platform R&D team, will talk about the journey we made from early days of the data platform in production with high friction and low awareness to issues into a mature, measurable data platform that is visible and trustworthy.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
video: https://www.youtube.com/watch?v=IBC9gcYqNR4
In this talk Efim Dimenstein, Chief Architect at Liveperson will cover the rules and guidelines of building resilient systems, implementing them in real life and lessons learned during the process. The talk will focus on achieving resilience in real life and will feature a lot of examples and lessons learned from building systems currently in production running at extreme scale.
Efim will talk about:
· General resilience guidelines
· How they are implemented in practice
· What changes needed to be implemented to achieve
resilience
· Lessons learned
· Summary
My name is Victor Perepelitsky I'm an R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team.
In this Meetup I talked about the journey of creating the platform from scratch - challenges, design decisions, technology choices and more.
During the last 3 years the team has built Real Time Event Processing Platform which is currently running in production with thousands of new and migrated customers. It is built to handle hundreds of thousands requests per/sec with low latency response time (under 30 ms round trip)
I went through different topics and stages of this journey and share details that led to specific choices and results.
“Stateful or Stateless”, “CEP”, “Rules engine”, “Automated performance testing”, “Locking”, “Timing” were a part of the menu.
In this talk Sergei Koren, Production Architect at LivePerson will present HTTP/2, the official successor of HTTP 1.1, and how it would influence Web as we know it.
Sergei will talk about:
- HTTP/2 history
- The major changes - what do and don’t
- Expected changes to Web as we use it today
- Proposed checklist for implementation: how and when; from Production point of view.
Mobile app real-time content modifications using websocketsLivePerson
We are happy to host Benny Weingarten-Gabbay, Senior Software Engineer at eBay at our offices.
Benny presents BetterContent, a tool that allows editing of an iOS mobile app in runtime, in a fun and easy way.
Read more on our DevBlog:
https://connect.liveperson.com/community/developers/blog/2015/03/26/mobile-app-real-time-content-modifications-using-websockets
Mobile SDK: Considerations & Best Practices LivePerson
Mobile SDKs are a great way to make your service or API easily consumable by the large number of developers out there looking for state of the art tools to make their apps stand out in the competitive marketplaces, but building a stable, compatible and successful SDK is quite a challenge.
In this talk we the technical and design challenges involved in developing an efficient mobile SDK that is highly compatible with its host mobile app, and the various considerations we took into account and the lessons we’ve learned while designing and building LivePerson’s native mobile SDK.
In this Meetup Victor Perepelitsky - R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team , will talk about Java 8', 'Stream API', 'Lambda', and 'Method reference'.
Victor will clarify what functional programming is and how can you use java 8 in order to create better software.
Victor will also cover some pain points that Java 8 did not solve regarding functionality and see how you can work around it.
If you are building a service oriented system and you want to build it for scale as well as flexibility. There are a few questions you need to make sure are asked and answered regarding the data interchange between services and offline persistency of services data. Questions as:
- How can I change a service API without breaking other services?
- How do I keep data from services consistent over time?
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency.
It includes a dive into the Apache Avro technology and how we used it.
Also what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Apache Avro and Messaging at Scale in LivePersonLivePerson
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.
In this lecture, Sergei Koren, System architect at LivePerson production team presents data & image compression and its effective usage in modern web and data flows.
Support Office Hour Webinar - LivePerson API LivePerson
Course description and agenda
LivePerson enables the creation of innovative applications designed to enhance and extend the functionality of your LivePerson solution, as well as cooperate with partners worldwide.
In this session we will demonstrate the LivePerson API offerings, the development process and quick overview of CHAT API and its basic usage. You will also have an opportunity to ask questions relevant to your business.
Host: Nitay Bartal
Date: July 17, 2014
Time: 11:00 AM - 12:00 PM EST
Duration: 60 minutes
Agenda:
- Leveraging LivePerson APIs to your benefit
- Overview of LivePerson API offerings
- Introduction to LivePerson Developers Network
- Overview of the Development process
- Tools and best practices
- Helpful tips and tricks
- Q&A
SIP - More than meets the eye
Speakers:
Ofer Cohen - VOIP Group Leader, LivePerson
Yossi Maimon - VOIP Technical Leader, LivePerson
An Introduction to the SIP protocol.
SIP Position in telecommunication networks and the content services.
What is SIP:
The Session Initiation Protocol (SIP) is a signaling communications protocol, widely used for controlling multimedia communication sessions such as voice and video calls over Internet Protocol (IP) networks.
The protocol defines the messages that are sent between peers which govern establishment, termination and other essential elements of a call. SIP can be used for creating, modifying and terminating sessions consisting of one or several media streams. SIP can be used for two-party (unicast) or multiparty (multicast) sessions. Other SIP applications include video conferencing, streaming multimedia distribution, instant messaging, presence information, file transfer, fax over IP and online games.
(Source: Wikipedia)
My name is Neta Barkay , and I'm a data scientist at LivePerson.
I'd like to share with you a talk I presented at the Underscore Scala community on "Efficient MapReduce using Scalding".
In this talk I reviewed why Scalding fits big data analysis, how it enables writing quick and intuitive code with the full functionality vanilla MapReduce has, without compromising on efficient execution on the Hadoop cluster. In addition, I presented some examples of Scalding jobs which can be used to get you started, and talked about how you can use Scalding's ecosystem, which includes Cascading and the monoids from Algebird library.
Read more & Video: https://connect.liveperson.com/community/developers/blog/2014/02/25/scalding-reaching-efficient-mapreduce
Building Enterprise Level End-To-End Monitor System with Open Source Solution...LivePerson
Recently, LivePerson's Production moved from traditional monitoring to a new enterprise monitoring system using only open source tools.
Oren Katz (Production Monitoring Team Leader) and Ittiel Savir (Automation team leader) will describe the road from a concept to the implementation in LivePerson,
In the lecture we will talk about chosen tools, the development process, tips, and how to avoid pitfalls
Check out Oren's recent blog post on the Subject: http://bit.ly/16i5lDS
Ofer Ron, senior data scientist at LivePerson.
Recently, I've had the pleasure of presenting an introduction to Data Science and data driven products at DevconTLV
I focused this talk around the basic ideas of data science, not the technology used, since I thought that far too many times companies and developers rush to play around with "big data" related technologies, instead of figuring out what questions they want to answer, and whether these answers form a successful product.
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
Ran Silberman, developer & technical leader at LivePerson presents how LivePerson moved their data platform from a legacy ETL concept to new "Data Integration" concept of our era.
Kafka is the main infrastructure that holds the backbone for data flow in the new Data Integration. Having that said, Kafka cannot come by itself. Other supporting systems like Hadoop, Storm, and Avro protocol were also integrated.
In this lecture Ran will describe the implementation in LivePerson and will share some tips and how to avoid pitfalls.
Read More: https://connect.liveperson.com/community/developers/blog/2013/11/21/from-a-kafkaesque-story-to-the-promised-land
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
1. DLD. Tel-Aviv. 2015
Making Scale a Non-Issue
for Real-Time Data Apps
Vladi Feigin, LivePerson
Kobi Salant, LivePerson
2. Agenda
Intro
About LivePerson
Digital Engagements
Call Center Use Case
Architecture
Zoom-In
3. Bio
Vladi Feigin
System Architect in LivePerson
18 years in software development
Interests : distributed computing, data, analytics and
martial arts
4. Bio
Kobi Salant
Data Platform Tech Lead in LivePerson
25 years in software development
Interests : Application performance, traveling and coffee
5. LivePerson
We do Digital Engagements
Agile and very technological
Real Big Data and Analytics company
Really cool place to work in
One of the SaaS pioneers
6 Data Centers across the world
Founded in 1995,
a public company
since 2000
(NASDAQ: LPSN)
More than 18,000
customers
worldwide
More than 1000
employees
7. We are Big Data
1.4 Million concurrent visits
1 Million events per second
2 billion site visits per month
27 million live engagements per month
Data freshness SLA (RT flow): up to 5 seconds
12. Call Center Operating
Digital engagement requires operating a call center in the
most efficient way
How to operate a call center in the most efficient way?
Provide operational metrics … In real-time
What are the challenges?
Huge scale, load peaks, real-time calculations, high data
freshness SLA
16. Data Producers. Requirements
Real time
“Five nines” persistence
Small footprint
No interference with service
Multiple producers & platforms
Monolithic to service oriented
Many
More
Services
17. Data Producers. Lessons learned
Hundreds of services
Complex rollouts
Minimal logic to avoid painful fixes
Audit streaming? Split to buckets
Real time and “five nines” persistence are incompatible
In House
1
Bucket Bucket
18. Consistent
Topic
Send message
to Kafka
local file
Persist message to
local disk
Kafka Bridge
Send message
to Kafka
Fast
Topic
Kafka Resilience
Real-time
Customers
Offline
Customers
Kafka
Data Producers. Flow
19. Data Model Framework
Why Avro:
Schema based evolution
Performance - Untagged bytes
HDFS ecosystem support
Lessons Learned:
Schema evolution breaks
Big schema (ours is over 65k) not recommended
Avoid deep nesting and multiple unions
Need a framework
Chaos – Non-Schema
space delimited
Order – Avro Schema
20. Framework Flow
1. Event is created according to Avro
Schema version 3.5
2. Schema is registered into the
repository (once)
3. Value 3.5 is written to header
4. Event is encoded with schema
version 3.5 and added to message
5. Message is sent to Kafka
6. Message is read by consumer
7. Header is read from message
8. Schema is retrieved from repository
according to scheme version
9. Event decoded using the proper Avro
schema
10.Decoded event is processed
3.5
3.5
Consumer
Repository
21. Apache Kafka
More than 15 billion events a day
More than 1 million events per second
Hundreds of producers & consumers
Why Kafka?
Scale where traditional MQs fail
Industry standard for big data log messaging
Reliable, flexible and easy to use
Deployment:
We have 15 clusters across the world
Our biggest cluster has 8 nodes with more than 6TB (Avro + Kafka
compression)
Maximum retention of 72 hours
22. Apache Kafka. Lessons Learned
Scale horizontally for hardware resources and vertically for
throughput
Look at trends of network & IO & Kafka's JMX statistics
Partitions Servers
Bytes in
23. Apache Kafka. Lessons Learned cont.
Know your data and message sizes:
Large messages can break you
Data growth can overfill your capacity
Set the right configuration
Adding or removing a broker is not trivial
Decide on single or multiple topics
24. Apache Storm
Why Storm?
Growing community with good integration to Kafka
At the time, it was the leading product
Easy development and customization
The POC was successful
Deployment:
We have 6 clusters across the world
Our biggest cluster has more then 30 nodes
We have 20 topologies on a single cluster
Uptime of months for a single topology
26. Apache Storm. Lessons learned
Develop SDK and educate R&D
Where did my topology run last week? What is my overtime
capacity?
Know your bolts, must return a timely answer
Coding is easy, performance is hard
Use isolation
Capacity
27. Apache Storm. Lessons learned cont.
Use local shuffling
Use Ack
KAFKA SPOUT FILTER BOLT WRITER BOLT
KAFKA SPOUT FILTER BOLT WRITER BOLT
Local
emit
ACKER BOLT
ACKER BOLT
COMM BOLT
COMM BOLT
Worker
A
Worker
B
Local
emit
Local
emit
Local
emit
28. Summary
No one-size-fits-all solution
Ask product for a clearly defined SLA
Separate between fast and consistent data flows - they
don’t merge!
Use schema for a data model - keep it flat and small
Kafka rules! It’s reliable and fast - use it
Storm has it’s toll. For some use-cases we would be
using Spark Streaming today
29. THANK YOU!
We are hiring
http://www.liveperson.com/company/careers
Q/A