Amihay Zer-Kavod discusses how LivePerson uses Apache Avro to maintain consistent data across services. Avro provides a unified event schema and tools for serialization, enabling events to be sent between services and stored in Hadoop. LivePerson's use of an event-driven system with a common Avro schema allows over 320,000 events per second to be processed and over 2TB of data to be stored daily.
Apache Avro and Messaging at Scale in LivePersonLivePerson
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
JSON, by now, became a regular part of most applications and services. Do we, how ever, really want to transfer human readable information or are we looking for a binary protocol to be as debuggable as JSON? CBOR the Concise Binary Object Representation offers the best of JSON + an extremely efficient, binary representation.
http://www.cbor.io
Apache Avro and Messaging at Scale in LivePersonLivePerson
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
JSON, by now, became a regular part of most applications and services. Do we, how ever, really want to transfer human readable information or are we looking for a binary protocol to be as debuggable as JSON? CBOR the Concise Binary Object Representation offers the best of JSON + an extremely efficient, binary representation.
http://www.cbor.io
A look at a modern programming language and how it can enhance the productivity and workflow of developers while promising higher execution performance. We will be talking about metaprogramming, functional programming and type systems in a pragmatic way mixed with some examples from `vibe.d` web framework.
By Yazan Dabain - Senior System Engineer, Arabia Weather
Do you want to upgrade your GWT application or write a sizable web application? Dart is the efficient choice.
As a brief example, check out http://lightningdart.com
This presentation is updated October 2015 for Silicon Valley Code Camp
This is the sixth set of slightly updated slides from a Perl programming course that I held some years ago.
I want to share it with everyone looking for intransitive Perl-knowledge.
A table of content for all presentations can be found at i-can.eu.
The source code for the examples and the presentations in ODP format are on https://github.com/kberov/PerlProgrammingCourse
A look at a modern programming language and how it can enhance the productivity and workflow of developers while promising higher execution performance. We will be talking about metaprogramming, functional programming and type systems in a pragmatic way mixed with some examples from `vibe.d` web framework.
By Yazan Dabain - Senior System Engineer, Arabia Weather
Do you want to upgrade your GWT application or write a sizable web application? Dart is the efficient choice.
As a brief example, check out http://lightningdart.com
This presentation is updated October 2015 for Silicon Valley Code Camp
This is the sixth set of slightly updated slides from a Perl programming course that I held some years ago.
I want to share it with everyone looking for intransitive Perl-knowledge.
A table of content for all presentations can be found at i-can.eu.
The source code for the examples and the presentations in ODP format are on https://github.com/kberov/PerlProgrammingCourse
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
An introduction to Apache Flume that comes from Hadoop Administrator Training delivered by GetInData.
Apache Flume is a distributed, reliable, and available service for collecting, aggregating, and moving large amounts of log data. By reading these slides, you will learn about Apache Flume, its motivation, the most important features, architecture of Flume, its reliability guarantees, Agent's configuration, integration with the Apache Hadoop Ecosystem and more.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
2014 Fast Casual and Quick Serve Restaurant Fast Food Restaurant Tour In Sydney Australia. Over 30 restaurants are highlighted with a brief description of their menus
21 people attended the July 2014 program meeting hosted by BDPA Cincinnati chapter. The topic was 'Open Source Tools and Resources'. The guest speaker was Greg Greenlee (Blacks In Technology).
'Open source' refers to a computer program in which the source code is available to the general public for use or modification from its original design. Open source code is typically created as a collaborative effort in which programmers improve upon the code and share the changes within the community. Open source sprouted in the technological community as a response to proprietary software owned by corporations. Over 85% of enterprises are using open source software. Managers are quickly realizing the benefit that community-based development can have on their businesses. This month, we put on our geek hats and detective gloves to learn how we can monitor our computers’ environments using open source tools. This meetup covered some of the most popular ‘Free and Open Source Software’ (FOSS) tools used to monitor various aspects of your computer environment.
Apache Beam (formerly Google Cloud Dataflow SDK) is an unified model and set of language-specific SDKs for defining and executing data processing workflows. You design pipelines, simplifying the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
This presentation introduces the Beam programming model, and how you can use it to design your pipelines, transporting PCollection and applying some PTransforms. You will see how the same code will be "translated" to a target runtimes thanks to a specific runner. You will also have an overview of the current roadmap, with the new interesting features.
Enforcing API Design Rules for High Quality Code GenerationTim Burks
[Co-presented with Mike Kistler, Architect for SDK Generation for the Watson Client Libraries]
The OpenAPI Specification is emerging as the leading standard for describing REST APIs. A key factor in the popularity of OpenAPI is the broad array of open source tools that it enables that create, manipulate, and publish documentation and code from OpenAPI descriptions. In this talk, we describe a configurable and extensible open source linter for OpenAPI that we are using to solve API code generation problems at IBM and Google. Our linter is based on Gnostic, an open source framework for working with API descriptions that was developed at Google and is available on GitHub.
OpenAPI itself is language-agnostic and is being used to generate code in a large set of popular programming languages. This generated code includes both server-side "stubs" and client libraries that are sometimes called software development kits (SDKs). IBM has begun to employ code generation for the Watson Developer Cloud SDKs and other companies are doing similar things, including Google, which generates client libraries from Google-specific API description formats. These teams have found that the quality of SDKs generated from API descriptions depends heavily on the quality of the descriptions. This goes far beyond mere syntactic compliance with a specification -- it involves proper API design, naming, and adherence to organization-wide design patterns. To address this, many companies have created API design guides. Some companies, such as Google and Microsoft, have published their API design guides externally, while others like IBM have kept theirs as internal documents. But to this point, verifying compliance with an API design guide has largely been a manual task. What is needed, we believe, is a configurable and extensible linter to check OpenAPI descriptions for conformance with rules derived from API design guides.
Data engineering Stl Big Data IDEA user groupAdam Doyle
Modern day Data Engineering requires creating reliable data pipelines, architecting distributed systems, designing data stores, and preparing data for other teams.
We’ll describe a year in the life of a Data Engineer who is tasked with creating a streaming data pipeline and touch on the skills necessary to set one up using Apache Spark.
Slides from the April 2019 meeting of the St. Louis Big Data IDEA meetup.
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
The concept of "Data Lake" is in everyone's mind today. The idea of storing all the data that accumulates in a company in a central location and making it available sounds very interesting at first. But Data Lake can quickly turn from a clear, beautiful mountain lake into a huge pond, especially if it is inexpertly entrusted with all the source data formats that are common in today's enterprises, such as XML, JSON, CSV or unstructured text data. Who, after some time, still has an overview of which data, which format and how they have developed over different versions? Anyone who wants to help themselves from the Data Lake must ask themselves the same questions over and over again: what information is provided, what data types do they have and how has the content changed over time?
Data serialization frameworks such as Apache Avro and Google Protocol Buffer (Protobuf), which enable platform-independent data modeling and data storage, can help. This talk will discuss the possibilities of Avro and Protobuf and show how they can be used in the context of a data lake and what advantages can be achieved. The support on Avro and Protobuf by Big Data and Fast Data platforms is also a topic.
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
Presented at SplunkLive! Frankfurt 2018:
Splunk Data Collection Architecture
Apps and Technology Add-ons
Demos / Examples
Best Practices
Resources and Q&A
Standardizing on a single N-dimensional array API for PythonRalf Gommers
MXNet workshop Dec 2020 presentation on the array API standardization effort ongoing in the Consortium for Python Data API Standards - see data-apis.org
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
Try to imagine the amount of time and effort it would take you to write a bug-free script or application that will accept a URL, port scan it, and for each HTTP service that it finds, it will create a new thread and perform a black box penetration testing while impersonating a Blackberry 9900 smartphone. While you’re thinking, Here’s how you would have done it in Hackersh:
“http://localhost” \
-> url \
-> nmap \
-> browse(ua=”Mozilla/5.0 (BlackBerry; U; BlackBerry 9900; en) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.1.0.346 Mobile Safari/534.11+”) \
-> w3af
Meet Hackersh (“Hacker Shell”) – A new, free and open source cross-platform shell (command interpreter) with built-in security commands and Pythonect-like syntax.
Aside from being interactive, Hackersh is also scriptable with Pythonect. Pythonect is a new, free, and open source general-purpose dataflow programming language based on Python, written in Python. Hackersh is inspired by Unix pipeline, but takes it a step forward by including built-in features like remote invocation and threads. This 120 minute lab session will introduce Hackersh, the automation gap it fills, and its features. Lots of demonstrations and scripts are included to showcase concepts and ideas.
LOGGING - About Needles in the Modern Haystack
https://www.macsysadmin.se/program.html
Every once in a while you'll read "collect the log files". How will this work with your Cloud Service, Identity provider, and SaaS solution? What's the challenges and what are the options at hand when monitoring macOS effectively for compliance?
In this session we talk about practices in storing and retrieving event information for monitoring, and review applications to build and process rich audit trails. This session aims to share our experiences made with commercial and open source backends applied to various client scenarios.
Apache Drill is new Apache incubator project. It's goal is to provide a distributed system for interactive analysis of large-scale datasets. Inspired by Google's Dremel technology, it aims to process trillions of records in seconds. We will cover the goals of Apache Drill, its use cases and how it relates to Hadoop, MongoDB and other large-scale distributed systems. We'll also talk about details of the architecture, points of extensibility, data flow and our first query languages (DrQL and SQL).
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks
Machine learning pipelines are a hot topic at the moment. Moving data through the pipeline in an efficient and predictable way is one of the most important aspects of running machine learning models in production.
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
Plug in the WSO2 Analytics Platform to some common business use cases
Showcase the numerous capabilities of the platform
Demonstrate how to collect data, analyze, predict and communicate effectively
Talk at Hug FR on December 4, 2012 about the new Apache Drill project. Notably, this talk includes an introduction to the converging specification for the logical plan in Drill.
Similar to Apache Avro in LivePerson [Hebrew] (20)
Kubernetes your tests! automation with docker on google cloud platformLivePerson
Arik Lerner, Automation Team Leader, and Waseem Hamshawi, Automation Infra Developer, present how to build a large scale automated testing platform by leveraging containers orchestration over GCP, with the ability to scale out and provide fast feedback while maintaining a highly reliable test infrastructure.
The presentation includes new approach of managing a scalable testing platform of distributed automated tests with Kubernetes and Docker over Google Cloud Platform.
Topics:
• GCP and Kubernetes introduction for automated testing
• Traditional Selenium Grid vs Selenium Standalone with Kubernetes and Docker for Web and Mobile tests
• Distributed and containerized testing environment over container cluster - different use cases
Ephemerals - "Short-lived Testing Endpoints". An Open Source by LivePerson which makes automation testing at large scale like a "Walk in the park".
In this Meetup Yaar Reuveni – Team Leader & Nir Hedvat – Software Engineer from Liveperson Data Platform R&D team, will talk about the journey we made from early days of the data platform in production with high friction and low awareness to issues into a mature, measurable data platform that is visible and trustworthy.
In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
video: https://www.youtube.com/watch?v=IBC9gcYqNR4
In this talk Efim Dimenstein, Chief Architect at Liveperson will cover the rules and guidelines of building resilient systems, implementing them in real life and lessons learned during the process. The talk will focus on achieving resilience in real life and will feature a lot of examples and lessons learned from building systems currently in production running at extreme scale.
Efim will talk about:
· General resilience guidelines
· How they are implemented in practice
· What changes needed to be implemented to achieve
resilience
· Lessons learned
· Summary
My name is Victor Perepelitsky I'm an R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team.
In this Meetup I talked about the journey of creating the platform from scratch - challenges, design decisions, technology choices and more.
During the last 3 years the team has built Real Time Event Processing Platform which is currently running in production with thousands of new and migrated customers. It is built to handle hundreds of thousands requests per/sec with low latency response time (under 30 ms round trip)
I went through different topics and stages of this journey and share details that led to specific choices and results.
“Stateful or Stateless”, “CEP”, “Rules engine”, “Automated performance testing”, “Locking”, “Timing” were a part of the menu.
In this meetup, Kobi Salant - Data Platform Technical Lead & Vladi Feigin - Data System Architect, both from Liveperson will talk about : Making scale a non-issue for real-time Data apps.
Have you ever tried to build a system processing in real-time hundreds of thousands events per second and servicing more than 1M concurrent visitors?
We're going to talk about the LivePerson real-time stream processing solution doing exactly that. Learn how we empower digital call centers with insights for their critical decision making processes and never-ending efficiency goals.
In this talk Sergei Koren, Production Architect at LivePerson will present HTTP/2, the official successor of HTTP 1.1, and how it would influence Web as we know it.
Sergei will talk about:
- HTTP/2 history
- The major changes - what do and don’t
- Expected changes to Web as we use it today
- Proposed checklist for implementation: how and when; from Production point of view.
Mobile app real-time content modifications using websocketsLivePerson
We are happy to host Benny Weingarten-Gabbay, Senior Software Engineer at eBay at our offices.
Benny presents BetterContent, a tool that allows editing of an iOS mobile app in runtime, in a fun and easy way.
Read more on our DevBlog:
https://connect.liveperson.com/community/developers/blog/2015/03/26/mobile-app-real-time-content-modifications-using-websockets
Mobile SDK: Considerations & Best Practices LivePerson
Mobile SDKs are a great way to make your service or API easily consumable by the large number of developers out there looking for state of the art tools to make their apps stand out in the competitive marketplaces, but building a stable, compatible and successful SDK is quite a challenge.
In this talk we the technical and design challenges involved in developing an efficient mobile SDK that is highly compatible with its host mobile app, and the various considerations we took into account and the lessons we’ve learned while designing and building LivePerson’s native mobile SDK.
In this Meetup Victor Perepelitsky - R&D Technical Leader at LivePerson leading the 'Real Time Event Processing Platform' team , will talk about Java 8', 'Stream API', 'Lambda', and 'Method reference'.
Victor will clarify what functional programming is and how can you use java 8 in order to create better software.
Victor will also cover some pain points that Java 8 did not solve regarding functionality and see how you can work around it.
In this lecture, Sergei Koren, System architect at LivePerson production team presents data & image compression and its effective usage in modern web and data flows.
Support Office Hour Webinar - LivePerson API LivePerson
Course description and agenda
LivePerson enables the creation of innovative applications designed to enhance and extend the functionality of your LivePerson solution, as well as cooperate with partners worldwide.
In this session we will demonstrate the LivePerson API offerings, the development process and quick overview of CHAT API and its basic usage. You will also have an opportunity to ask questions relevant to your business.
Host: Nitay Bartal
Date: July 17, 2014
Time: 11:00 AM - 12:00 PM EST
Duration: 60 minutes
Agenda:
- Leveraging LivePerson APIs to your benefit
- Overview of LivePerson API offerings
- Introduction to LivePerson Developers Network
- Overview of the Development process
- Tools and best practices
- Helpful tips and tricks
- Q&A
SIP - More than meets the eye
Speakers:
Ofer Cohen - VOIP Group Leader, LivePerson
Yossi Maimon - VOIP Technical Leader, LivePerson
An Introduction to the SIP protocol.
SIP Position in telecommunication networks and the content services.
What is SIP:
The Session Initiation Protocol (SIP) is a signaling communications protocol, widely used for controlling multimedia communication sessions such as voice and video calls over Internet Protocol (IP) networks.
The protocol defines the messages that are sent between peers which govern establishment, termination and other essential elements of a call. SIP can be used for creating, modifying and terminating sessions consisting of one or several media streams. SIP can be used for two-party (unicast) or multiparty (multicast) sessions. Other SIP applications include video conferencing, streaming multimedia distribution, instant messaging, presence information, file transfer, fax over IP and online games.
(Source: Wikipedia)
My name is Neta Barkay , and I'm a data scientist at LivePerson.
I'd like to share with you a talk I presented at the Underscore Scala community on "Efficient MapReduce using Scalding".
In this talk I reviewed why Scalding fits big data analysis, how it enables writing quick and intuitive code with the full functionality vanilla MapReduce has, without compromising on efficient execution on the Hadoop cluster. In addition, I presented some examples of Scalding jobs which can be used to get you started, and talked about how you can use Scalding's ecosystem, which includes Cascading and the monoids from Algebird library.
Read more & Video: https://connect.liveperson.com/community/developers/blog/2014/02/25/scalding-reaching-efficient-mapreduce
Building Enterprise Level End-To-End Monitor System with Open Source Solution...LivePerson
Recently, LivePerson's Production moved from traditional monitoring to a new enterprise monitoring system using only open source tools.
Oren Katz (Production Monitoring Team Leader) and Ittiel Savir (Automation team leader) will describe the road from a concept to the implementation in LivePerson,
In the lecture we will talk about chosen tools, the development process, tips, and how to avoid pitfalls
Check out Oren's recent blog post on the Subject: http://bit.ly/16i5lDS
Ofer Ron, senior data scientist at LivePerson.
Recently, I've had the pleasure of presenting an introduction to Data Science and data driven products at DevconTLV
I focused this talk around the basic ideas of data science, not the technology used, since I thought that far too many times companies and developers rush to play around with "big data" related technologies, instead of figuring out what questions they want to answer, and whether these answers form a successful product.
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
Ran Silberman, developer & technical leader at LivePerson presents how LivePerson moved their data platform from a legacy ETL concept to new "Data Integration" concept of our era.
Kafka is the main infrastructure that holds the backbone for data flow in the new Data Integration. Having that said, Kafka cannot come by itself. Other supporting systems like Hadoop, Storm, and Avro protocol were also integrated.
In this lecture Ran will describe the implementation in LivePerson and will share some tips and how to avoid pitfalls.
Read More: https://connect.liveperson.com/community/developers/blog/2013/11/21/from-a-kafkaesque-story-to-the-promised-land
LivePerson Developers is proud to host a meetup about A/B testing by Shlomo Lahav, Chief Scientist at LivePerson.
The lecture will focus on testing and the ability to deduct conclusions, especially in the web.
- What is an A/B test?
- How to construct an A/B test properly?
- What are the metrics that can be used?
- Can the results be miss leading?
- Errors: bias and statistical errors
- First and second type errors
- Measuring lift, why lift is a biased measure.
- Is it possible to change the test settings during the test?
- How to run a multivariate testing effectively?
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Apache Avro in LivePerson [Hebrew]
1. Apache Avro in LivePerson
Collecting and saving data is easy
keeping it consistent is tough
Sandwich club, Sep 2014
Amihay Zer-Kavod, Software Architect
2. Who am I?
Amihay Zer-Kavod
Software Architect
Been in software Since 1989
4. Communication & Meaning
● Consistent but decoupled communication
between services, such as:
o Monitoring, Interaction
o Predictive, Sentiment
o RT Reporting & Analysis
o Visitor History
event
evento
事件
घटना
حدث
ארוע
событие
● Consistent meaning over time
o BigData Store (Hadoop)
o Offline Reporting & Analysis
5. What shouldn’t we use?
Don’t use Direct APIs!
They are completely wrong for this subject:
• They produce too much coupling between services
• APIs are synchronous by nature
• Adds irrelevant complexity to the called service
6. What is needed?
The Message is the API!
● A unified event model (schema) for all reported events
● Management tools for the unified schema
● Tools for sending events over the wire
● Tools for reading/writing event in big data
● Backward and forward compatibility
7. The Event model
From generic to specific structure with:
• Common header - all common data to all events
• Logical Entities - common header to all logical entities
(such as Visitor)
• Dynamic Specific headers
• Specific Event body
8. Apache Avro to the rescue
● Avro - a schema based serialization/deserialization
framework
● Avro idl - schema definition language
● Avro file - Hadoop integration
● Avro schema resolution
● Apache Avro created by Doug Cutting
11. Avro 101 - Avro IDL Schema
@namespace("com.liveperson.example")
enum Color { NO_COLOR, BLUE, BLACK, WHITE, PINK }
/**
Example event
*/
@namespace("com.liveperson.example")
record Event {
string id = “Unknown”;
long time = -1;
Color color = "NO_COLOR";
}
12. Avro 101 - Serialization
● JSON Serialization
● Binary serialization
○ int, long - variable length, Zig-zag encoding
○ float, double - 4,8 bytes respectively
○ string - long followed by UTF-8 bytes
○ map, array - unlimited size, use blocks
○ Unions - long index of the type
14. Avro 101 - Schema Resolution
● Writer schema must be always provided for decoding
● Reader can use its own schema
● Allows the reader and writer schema to evolve
independently
15. Avro vs...
Technologies Protobuf Thrift Avro
Created 2001 (2008) 2007 2009
Creator / Maintainer Google / Google Facebook / Apache
Doug cutting /
Apache
Schema evolution Field Tag Field Tag Schema
Static/Dynamic Yes/No Yes/No Yes/Yes
Hadoop support No No Yes
RPC No Yes Yes
Used by Google Facebook, Cassandra Hadoop, Liveperson
Lang support Good Great Good
16. Backward & Forward Compatibility
Avro schema evolution
● Avro supports resolution between two schemes
● Need to follow a set of rules:
● Every field must have a default value
● A field can be added (make sure to put a default value)
● Field types can not be changed (add a new field
instead)
● enum symbols can be added but never removed
17. Avro IDL - LivePerson Event
/** Base for all LivePerson Events
*/
@namespace("com.liveperson.global")
record LPEvent {
/** Common Header of the event */
CommonHeader header = null;
/** Logical entity details participating in this event - Visitor, Agent, etc... */
array<Participant> participants = null;
/** Holding specific platform info as node name (machine) cluster Id etc... */
PlatformHeader platformSpecificHeader = null;
/** Auditing Header, Optional - adds data for auditing of the events flow in the platform*/
union {null, AuditingHeader } auditingHeader = null;
/** The event body */
EventBody eventBody = null;
}
19. How good does it work?
● Cyber Monday 2013 (one day)
o More than 320,000 events per second
o 7 Storm topologies consuming the events seconds from
real time
o 2TB of data saved to Hadoop
● 2014 preparation:
o x2 number of events per second to ~640,000
20. So how did we do it?
1. Use an event driven system, don’t use direct APIs
2. Create a unified schema for all events
3. Use Avro to implement the schema
4. Add some supporting infrastructure