This document summarizes how Lucene powers LinkedIn's segmentation and targeting platform. It discusses the architecture including an indexer architecture that uses Lucene to index attributes from Hadoop and a serving architecture that uses Lucene to serve segments. It also covers lessons learned such as reusing index components and caching for performance. Finally, it explains why Lucene was chosen over other solutions due to its support for dynamic schemas and ability to bootstrap indexes from Hadoop.
JDBC is a Java API that allows Java programs to execute SQL statements. There are four types of JDBC drivers: Type 1 uses JDBC-ODBC bridge and ODBC driver; Type 2 uses a native database API; Type 3 uses a middleware layer for database independence; Type 4 communicates directly with the database using its native protocol. To connect to a database using JDBC, an application loads the appropriate JDBC driver and then calls DriverManager.getConnection(), specifying the database URL, username, and password.
The document discusses the four types of JDBC drivers:
- Type 1 drivers use JDBC-ODBC bridge, which converts JDBC to ODBC. They are platform dependent.
- Type 2 drivers use native database APIs and are partly Java. The database client is needed.
- Type 3 drivers use a middleware layer that converts JDBC to the database protocol. They support multiple databases.
- Type 4 drivers directly convert JDBC to the database protocol. They are 100% Java but database dependent.
Chapter 12:Understanding Server-Side TechnologiesIt Academy
Exam Objective 8.4 Describe at a high level the fundamental benefits and drawbacks of using J2EE server-side technologies, and describe and compare the basic characteristics of the web-tier, business-tier, and EIS tier.
Comparative Study That Aims Rdf Processing For The Java PlatformComputer Science
This document provides a comparative study of popular Java APIs for processing Resource Description Framework (RDF) data. It summarizes four main APIs: JRDF, Sesame, and Jena. For each API, it describes key features like storage methods, query support, documentation, and license. It finds that while each API has strengths, Sesame and Jena tend to have richer documentation and more developed feature sets than JRDF. The study aims to help Java developers choose the best RDF processing API for their needs.
The document describes LinkedIn's Segmentation & Targeting Platform, a big data application built on Hadoop. It allows users to define segments of users based on attributes and target them for marketing campaigns. Attributes can be computed from multiple data sources and consolidated. Segments are defined through a self-service portal using SQL-like queries. The platform processes complex queries fast and moves at business speed while handling LinkedIn's massive data volumes.
The addition of the REST API to ArcGIS Server at 9.3 was quickly embraced by developers as a simple but powerful way to utilize ArcGIS Server output in application development. Furthermore the new ArcGIS API for Flex enabled the development of true Rich Internet Applications (RIAs) on top of ArcGIS Server. But the Flex Framework in its current state imposes significant limitations on RESTful communication via HTTP: there's just no straightforward way to extract the headers from an HTTP-response in ActionScript3. This means you can't read the id of a newly created resource from the 'Location' header, making it impossible to tell the difference between a '500 Internal Server Error', a '404 Not Found' or a '422 Validation Error'. There's also no way to get the response body for anything outside the 2xx status range. And the only HTTP methods accepted are GET and POST, meaning you cannot use PUT or DELETE, at least not without a proxy. This session focuses on how to unleash the full power of REST for your RIA using a BlazeDS proxy on your application backend to circumvent aforementioned limitations. We'll walk you through the creation of a Java based RESTful Web service using Jersey (an open source reference implementation for JSR-311/JAX-RS), and show how to set up the Flex application on the client side to utilize the complete set of RESTful Web Service HTTP methods and status codes.
This document compares the JAX-RS and Spring frameworks for building RESTful web services. It discusses key areas such as resource identifiers, request data binding, response handling, content negotiation, caching, security, and exception handling. Overall, both frameworks provide similar functionality for developing REST APIs but with some differences in their approaches and implementations. JAX-RS focuses solely on REST while Spring integrates REST capabilities into its existing MVC framework.
JDBC is a Java API that allows Java programs to execute SQL statements. There are four types of JDBC drivers: Type 1 uses JDBC-ODBC bridge and ODBC driver; Type 2 uses a native database API; Type 3 uses a middleware layer for database independence; Type 4 communicates directly with the database using its native protocol. To connect to a database using JDBC, an application loads the appropriate JDBC driver and then calls DriverManager.getConnection(), specifying the database URL, username, and password.
The document discusses the four types of JDBC drivers:
- Type 1 drivers use JDBC-ODBC bridge, which converts JDBC to ODBC. They are platform dependent.
- Type 2 drivers use native database APIs and are partly Java. The database client is needed.
- Type 3 drivers use a middleware layer that converts JDBC to the database protocol. They support multiple databases.
- Type 4 drivers directly convert JDBC to the database protocol. They are 100% Java but database dependent.
Chapter 12:Understanding Server-Side TechnologiesIt Academy
Exam Objective 8.4 Describe at a high level the fundamental benefits and drawbacks of using J2EE server-side technologies, and describe and compare the basic characteristics of the web-tier, business-tier, and EIS tier.
Comparative Study That Aims Rdf Processing For The Java PlatformComputer Science
This document provides a comparative study of popular Java APIs for processing Resource Description Framework (RDF) data. It summarizes four main APIs: JRDF, Sesame, and Jena. For each API, it describes key features like storage methods, query support, documentation, and license. It finds that while each API has strengths, Sesame and Jena tend to have richer documentation and more developed feature sets than JRDF. The study aims to help Java developers choose the best RDF processing API for their needs.
The document describes LinkedIn's Segmentation & Targeting Platform, a big data application built on Hadoop. It allows users to define segments of users based on attributes and target them for marketing campaigns. Attributes can be computed from multiple data sources and consolidated. Segments are defined through a self-service portal using SQL-like queries. The platform processes complex queries fast and moves at business speed while handling LinkedIn's massive data volumes.
The addition of the REST API to ArcGIS Server at 9.3 was quickly embraced by developers as a simple but powerful way to utilize ArcGIS Server output in application development. Furthermore the new ArcGIS API for Flex enabled the development of true Rich Internet Applications (RIAs) on top of ArcGIS Server. But the Flex Framework in its current state imposes significant limitations on RESTful communication via HTTP: there's just no straightforward way to extract the headers from an HTTP-response in ActionScript3. This means you can't read the id of a newly created resource from the 'Location' header, making it impossible to tell the difference between a '500 Internal Server Error', a '404 Not Found' or a '422 Validation Error'. There's also no way to get the response body for anything outside the 2xx status range. And the only HTTP methods accepted are GET and POST, meaning you cannot use PUT or DELETE, at least not without a proxy. This session focuses on how to unleash the full power of REST for your RIA using a BlazeDS proxy on your application backend to circumvent aforementioned limitations. We'll walk you through the creation of a Java based RESTful Web service using Jersey (an open source reference implementation for JSR-311/JAX-RS), and show how to set up the Flex application on the client side to utilize the complete set of RESTful Web Service HTTP methods and status codes.
This document compares the JAX-RS and Spring frameworks for building RESTful web services. It discusses key areas such as resource identifiers, request data binding, response handling, content negotiation, caching, security, and exception handling. Overall, both frameworks provide similar functionality for developing REST APIs but with some differences in their approaches and implementations. JAX-RS focuses solely on REST while Spring integrates REST capabilities into its existing MVC framework.
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
Presented by Hien Luu, Technical Lead, LinkedIn
Rajasekaran Rangaswamy, LinkedIn
For internet companies, marketing campaigns play an important role in acquiring new customers, retaining and engaging existing customers, and promoting new products. The LinkedIn segmentation and targeting platform helps marketing teams to easily and quickly create member segments based on member attributes using nested predicate expressions ranging from simple to complex. Once segments are created, then those qualified members are targeted with marketing campaigns.
Lucene is a key piece of technology in this platform. This session will cover how we leverage Hadoop to efficiently build Lucene indexes for a large and growing member attribute data set of 225 million members, and how Lucene is used to create segments based on complex nested predicate expressions. This presentation will also share some of the lessons we learned and challenges we encountered from using Lucene to search over large data sets.
UNIT3 DBMS.pptx operation nd management of data baseshindhe1098cv
The document discusses client-server database architecture. Some key points:
- In client-server architecture, multiple clients connect to a central server which provides services to the clients. The server processes clients' requests and returns results.
- The architecture divides applications into presentation, logic, and data tiers. The presentation tier handles the user interface. The logic tier controls application functions. The data tier stores and retrieves data from the database.
- Advantages include centralized data control and scalability. Disadvantages are potential single point of failure if the server fails and increased hardware/software costs.
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...Marina Peregud
The document is a presentation by Anton Boyko on scalable application architecture in the cloud. It discusses that scalability is the ability of a system to handle increased workloads. It outlines different ways of scaling, including vertical scaling by increasing resources, and horizontal scaling by adding more servers. The key approach discussed is "divide and conquer" - dividing an application into independent modules and distributing the workload. Several patterns for scalability are described, such as static content hosting, queue-based load leveling, sharding of data, and separating reads from writes. The presentation emphasizes that there is no single best approach and the head must be used to determine the appropriate architecture.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Using a case study on a major European executive recruitment company, we will show how we used Apache Lucene/Solr to build powerful, flexible, accurate and scalable search services over tens of millions of CVs and candidate records, allowing the company to completely restructure their IT provision for both local and national offices.
This document summarizes Terry Bunio's presentation on breaking and fixing broken data. It begins by thanking sponsors and providing information about Terry Bunio and upcoming SQL events. It then discusses the three types of broken data: inconsistent, incoherent, and ineffectual data. For each type, it provides an example and suggestions on how to identify and fix the issues. It demonstrates how to use tools like Oracle Data Modeler, execution plans, SQL Profiler, and OStress to diagnose problems to make data more consistent, coherent and effective.
The document discusses MySQL replication. It defines two types of replication - statement-based and row-based replication. It explains that replication works by recording changes in the master's binary log and replaying the log on slaves. It provides steps for configuring replication including setting up accounts, configuring the master and slave, and instructing the slave to connect to the master. It also lists some benefits of replication like data distribution, load balancing, backups, and high availability.
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
This document discusses various business intelligence tools for data analysis including ETL, OLAP, reporting, and metadata tools. It provides evaluation criteria for selecting tools, such as considering budget, requirements, and technical skills. Popular tools are identified for each category, including Informatica, Cognos, and Oracle Warehouse Builder. Implementation requires determining sources, data volume, and transformations for ETL as well as performance needs and customization for OLAP and reporting.
Implementation of Oracle ExaData and OFM 11g with Banner in HCTKhalid Tariq
This document summarizes the implementation of Oracle Exadata and Oracle Fusion Middleware 11g with the Banner student information system at HCT. It provides an agenda, introduction to HCT, history of Banner/Oracle at HCT, specifications of the Exadata implementation, and a Q&A section. Key details include upgrading Banner to the latest versions, testing Exadata in a development environment before moving production Banner databases and applications to Exadata in May 2012, and performance/administration improvements with the new system.
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
The document discusses patterns for scalability in Microsoft Azure applications. It covers queue-based load leveling, competing consumers, and priority queue patterns for handling application load and message processing. It also discusses materialized view and sharding patterns for scaling databases, where materialized views optimize queries and sharding partitions data horizontally across multiple servers. The talk includes demos of priority queue and sharding patterns to illustrate their implementations.
The document discusses strategies for transitioning from monolithic architectures to microservice architectures. It outlines some of the challenges with maintaining large monolithic applications and reasons for modernizing, such as handling more data and needing faster changes. It then covers microservice design principles and best practices, including service decomposition, distributed systems strategies, and reactive design. Finally it introduces Lagom as a framework for building reactive microservices on the JVM and outlines its key components and development environment.
The document discusses Cloudify, an open source platform for deploying, managing, and scaling complex multi-tier applications on cloud infrastructures. It introduces key concepts of Cloudify including topologies defined using TOSCA, workflows written in Python, policies defined in YAML, and how Cloudify ties various automation tools together across the deployment continuum. The document also provides demonstrations of uploading a blueprint to Cloudify and installing an application using workflows, and discusses how Cloudify collects logs, metrics and handles events during workflow execution.
A Public Cloud Based SOA Workflow for Machine Learning Based Recommendation A...Ram G Athreya
Over the past decade the field of Cloud Computing has been the focus of intensive research. In this paper we propose a framework that will simulate the architectural setup of a cloud environment and examine how it can leverage Apriori and Sequential Pattern based recommendation algorithms through R. Furthermore, we present a multi layered application encompassing its backend architecture, user interface built using the responsive web design technique and its development workflow. The proposed system was also exhaustively load tested using Apache JMeter to ensure its reliability at scale and the experimental results are presented.
Solr and ElasticSearch demo and speaker feb 2014nkabra
The document provides an overview of distributed database architecture and search technologies. It discusses Solr and ElasticSearch, including their history, key features, use cases, and migration process. A presentation is given covering basics, current usage, highlights, and taking questions. Examples are provided of companies using ElasticSearch for applications like resume recommendations, integration, and searching large collections of documents.
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...Lucidworks
The document describes Bloomreach's architecture for managing a large-scale SolrCloud cluster across multiple data centers. It discusses the challenges of serving real-time queries at scale, managing configurations and rankings across tenants and data centers, and providing high availability and recovery capabilities. The key components of Bloomreach's architecture include a cluster management suite, replication and ranking configuration APIs, and deployment/recovery services for adding or replacing data centers and collections.
This document provides an introduction to Microsoft Azure, including key concepts like cloud computing, virtualization, cloud service models, and Azure components. It covers Azure storage services like blobs, tables, and SQL Azure. It also discusses the developer experience on Azure, using familiar tools like Visual Studio. Traffic Manager is introduced as a way to control traffic distribution for high availability. The document demonstrates deploying a web app to Azure and provides an overview of StudioRG infrastructure with an example.
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
Presented by Hien Luu, Technical Lead, LinkedIn
Rajasekaran Rangaswamy, LinkedIn
For internet companies, marketing campaigns play an important role in acquiring new customers, retaining and engaging existing customers, and promoting new products. The LinkedIn segmentation and targeting platform helps marketing teams to easily and quickly create member segments based on member attributes using nested predicate expressions ranging from simple to complex. Once segments are created, then those qualified members are targeted with marketing campaigns.
Lucene is a key piece of technology in this platform. This session will cover how we leverage Hadoop to efficiently build Lucene indexes for a large and growing member attribute data set of 225 million members, and how Lucene is used to create segments based on complex nested predicate expressions. This presentation will also share some of the lessons we learned and challenges we encountered from using Lucene to search over large data sets.
UNIT3 DBMS.pptx operation nd management of data baseshindhe1098cv
The document discusses client-server database architecture. Some key points:
- In client-server architecture, multiple clients connect to a central server which provides services to the clients. The server processes clients' requests and returns results.
- The architecture divides applications into presentation, logic, and data tiers. The presentation tier handles the user interface. The logic tier controls application functions. The data tier stores and retrieves data from the database.
- Advantages include centralized data control and scalability. Disadvantages are potential single point of failure if the server fails and increased hardware/software costs.
Антон Бойко "Разделяй и властвуй — набор практик для построения масштабируемо...Marina Peregud
The document is a presentation by Anton Boyko on scalable application architecture in the cloud. It discusses that scalability is the ability of a system to handle increased workloads. It outlines different ways of scaling, including vertical scaling by increasing resources, and horizontal scaling by adding more servers. The key approach discussed is "divide and conquer" - dividing an application into independent modules and distributing the workload. Several patterns for scalability are described, such as static content hosting, queue-based load leveling, sharding of data, and separating reads from writes. The presentation emphasizes that there is no single best approach and the head must be used to determine the appropriate architecture.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Using a case study on a major European executive recruitment company, we will show how we used Apache Lucene/Solr to build powerful, flexible, accurate and scalable search services over tens of millions of CVs and candidate records, allowing the company to completely restructure their IT provision for both local and national offices.
This document summarizes Terry Bunio's presentation on breaking and fixing broken data. It begins by thanking sponsors and providing information about Terry Bunio and upcoming SQL events. It then discusses the three types of broken data: inconsistent, incoherent, and ineffectual data. For each type, it provides an example and suggestions on how to identify and fix the issues. It demonstrates how to use tools like Oracle Data Modeler, execution plans, SQL Profiler, and OStress to diagnose problems to make data more consistent, coherent and effective.
The document discusses MySQL replication. It defines two types of replication - statement-based and row-based replication. It explains that replication works by recording changes in the master's binary log and replaying the log on slaves. It provides steps for configuring replication including setting up accounts, configuring the master and slave, and instructing the slave to connect to the master. It also lists some benefits of replication like data distribution, load balancing, backups, and high availability.
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
This document discusses various business intelligence tools for data analysis including ETL, OLAP, reporting, and metadata tools. It provides evaluation criteria for selecting tools, such as considering budget, requirements, and technical skills. Popular tools are identified for each category, including Informatica, Cognos, and Oracle Warehouse Builder. Implementation requires determining sources, data volume, and transformations for ETL as well as performance needs and customization for OLAP and reporting.
Implementation of Oracle ExaData and OFM 11g with Banner in HCTKhalid Tariq
This document summarizes the implementation of Oracle Exadata and Oracle Fusion Middleware 11g with the Banner student information system at HCT. It provides an agenda, introduction to HCT, history of Banner/Oracle at HCT, specifications of the Exadata implementation, and a Q&A section. Key details include upgrading Banner to the latest versions, testing Exadata in a development environment before moving production Banner databases and applications to Exadata in May 2012, and performance/administration improvements with the new system.
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
The document discusses patterns for scalability in Microsoft Azure applications. It covers queue-based load leveling, competing consumers, and priority queue patterns for handling application load and message processing. It also discusses materialized view and sharding patterns for scaling databases, where materialized views optimize queries and sharding partitions data horizontally across multiple servers. The talk includes demos of priority queue and sharding patterns to illustrate their implementations.
The document discusses strategies for transitioning from monolithic architectures to microservice architectures. It outlines some of the challenges with maintaining large monolithic applications and reasons for modernizing, such as handling more data and needing faster changes. It then covers microservice design principles and best practices, including service decomposition, distributed systems strategies, and reactive design. Finally it introduces Lagom as a framework for building reactive microservices on the JVM and outlines its key components and development environment.
The document discusses Cloudify, an open source platform for deploying, managing, and scaling complex multi-tier applications on cloud infrastructures. It introduces key concepts of Cloudify including topologies defined using TOSCA, workflows written in Python, policies defined in YAML, and how Cloudify ties various automation tools together across the deployment continuum. The document also provides demonstrations of uploading a blueprint to Cloudify and installing an application using workflows, and discusses how Cloudify collects logs, metrics and handles events during workflow execution.
A Public Cloud Based SOA Workflow for Machine Learning Based Recommendation A...Ram G Athreya
Over the past decade the field of Cloud Computing has been the focus of intensive research. In this paper we propose a framework that will simulate the architectural setup of a cloud environment and examine how it can leverage Apriori and Sequential Pattern based recommendation algorithms through R. Furthermore, we present a multi layered application encompassing its backend architecture, user interface built using the responsive web design technique and its development workflow. The proposed system was also exhaustively load tested using Apache JMeter to ensure its reliability at scale and the experimental results are presented.
Solr and ElasticSearch demo and speaker feb 2014nkabra
The document provides an overview of distributed database architecture and search technologies. It discusses Solr and ElasticSearch, including their history, key features, use cases, and migration process. A presentation is given covering basics, current usage, highlights, and taking questions. Examples are provided of companies using ElasticSearch for applications like resume recommendations, integration, and searching large collections of documents.
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...Lucidworks
The document describes Bloomreach's architecture for managing a large-scale SolrCloud cluster across multiple data centers. It discusses the challenges of serving real-time queries at scale, managing configurations and rankings across tenants and data centers, and providing high availability and recovery capabilities. The key components of Bloomreach's architecture include a cluster management suite, replication and ranking configuration APIs, and deployment/recovery services for adding or replacing data centers and collections.
This document provides an introduction to Microsoft Azure, including key concepts like cloud computing, virtualization, cloud service models, and Azure components. It covers Azure storage services like blobs, tables, and SQL Azure. It also discusses the developer experience on Azure, using familiar tools like Visual Studio. Traffic Manager is introduced as a way to control traffic distribution for high availability. The document demonstrates deploying a web app to Azure and provides an overview of StudioRG infrastructure with an example.
Similar to How Lucene Powers the LinkedIn Segmentation and Targeting Platform (20)
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
4. Agenda
•
•
•
•
Little bit about LinkedIn
Segmentation & Targeting Platform Overview
How Lucene powers Segmentation & Targeting Platform
Q&A
5. Our Mission
Connect the world’s professionals to make them
more productive and successful.
Our Vision
Create economic opportunity for every
professional in the world.
Members First!
6. The world’s largest professional network
Over 65% of members are now international
>30M
>90%
Fortune
100
Companies
use
LinkedIn
Talent
Soln
to
hire
>3M
Company
Pages
19
Languages
>5.7B
Professional
searches
in
2012
7. Other Company Facts
•
•
Headquartered
in
Mountain
View,
Calif.,
with
offices
around
the
world!
LinkedIn
has
~4200
full-‐3me
employees
located
around
the
world
11. Segmentation & Targeting Platform Overview
2. Attributes Added to Table
1. Create attributes
§
§
§
§
§
Name
Email
State
Occupation
Etc.
Name
Email
State
OccupaEon
John
Smith
jsmith@blah.com
California
Engineer
Jane
Smith
smithj@mail.com
Nevada
HR
Manager
Jane
Doe
jdoe@email.com
California
Engineer
3. Create Target Segment:
California, Engineer
Name
Email
State
OccupaEon
John
Smith
jsmith@blah.com
California
Engineer
Jane
Doe
jdoe@email.com
California
Engineer
4. Export List & Send Vendor
…
12. Segmentation & Targeting Platform Overview
• Business definition
– Business would like to launch new campaigns often
– Business would like to specify targeting criteria using
arbitrary set of attributes
– Attributes need to be computed to fulfill the targeting
criteria
– The attribute data resides on Hadoop or TD
– Business is most comfortable with SQL-like language
18. Segmentation & Targeting Platform Overview
Who are the job seekers?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are north American recruiters that
don’t work for a competitor?
23. Mapper
Architecture
mysql
attribute
store
K=>
AvroKey<GenericRecord>
V=>
AvroValue<NullWritable>
Attribute
Definitions
HDFS
shard 1
Avro data in
HDFS
Hadoop
Indexer MR
shard 2
Index Merger
shard n
Web Servers
Reducer
K=>
NullWritable
V=>
LuceneDocumentWrapper
LuceneOutputFormat
RecordWriter
LuceneDocumentWrapper
Document
Index
25. How Lucene powers Segmentation & Targeting Platform
• Architecture
– Indexer Architecture
– Serving Architecture
• Load Balanced Model
•
•
•
•
Next Steps - Distributed Model
DocValues
Lessons Learnt
Why not use an existing solution?
26. Serving – Load Balanced Model
HTTP Request
Load Balancer
Web Server 1
Shard 1
Web Server 2
Shard 2
Shared Drive
Web Server n
Shard n
27. Serving – Load Balanced Model
But
Wait…..
• Is
load
balancing
alone
good
enough?
• What
about
distribu3on
and
failover?
28. How Lucene powers Segmentation & Targeting Platform
• Architecture
– Indexer Architecture
– Serving Architecture
•
•
•
•
•
Load Balanced Model
Next Steps - Distributed Model
DocValues
Lessons Learnt
Why not use an existing solution?
29. Next Steps – Distributed Model
• A
generic
cluster
management
framework
• Manage
par33oned
and
replicated
resources
in
distributed
systems
• Built
on
top
of
Zookeeper
that
hides
the
complexity
of
ZK
primi3ves
• Provides
distributed
features
such
as
leader
elec3on,
two-‐phase
commit
etc.
via
a
model
of
state
machine
hLp://helix.incubator.apache.org/
30. Next Steps – Distributed Model
HTTP Request
Load Balancer
Scatter Gather
Web Server 1
Web Server 2
Web Server 3
Shard
1
active
Shard
2
active
Shard
3
active
Shard
2
standby
Shard
3
standby
Shard
1
standby
31. Next Steps – Distributed Model
HTTP Request
Load Balancer
Scatter Gather
Web Server 1
Web Server 2
Web Server 3
Shard
1
active
Shard
2
active
Shard
3
failure
Shard
2
standby
Shard
3
active
Shard
1
failure
32. • Architecture
– Indexer Architecture
– Serving Architecture
•
•
•
•
•
Load Balanced Model
Next Steps - Distributed Model
DocValues
Lessons Learnt
Why not use an existing solution?
33. DocValues – Use Case
• Once segments are built, users want to forecast, see a
target revenue projection for the campaigns that they
want to run.
• Campaigns can be run on various Revenue Models
• This involves adding per member Propensity Scores and
Dollar Amounts
34. DocValues – Why not Stored Fields?
Why
not
use
Stored
Fields?
Document ID
• Stored
fields
have
one
indirec3on
per
document
resul3ng
in
two
disk
seeks
.fdx
fetch filepointer to field data
.fdt
scan by id until field is found
per
document
• Performance
cost
quickly
adds
up
when
fetching
millions
of
documents
35. DocValues – Why not Stored Fields?
• Why not use Field Cache?
– Is memory resident
– Works fine when there is enough memory
– But keeping millions of un-inverted values in memory is
impossible
– Additional cost to parse values (from String and to String)
36. DocValues
• Dense column based storage
– (1 Value per Document and 1 Column per field and segment)
• Accepts primitives
• No conversion from/to String needed
• Loads 80x-100x faster than building a FieldCache
• All the work is done during Indexing
• DocValue fields can be indexed and stored too
37. • Architecture
– Indexer Architecture
– Serving Architecture
•
•
•
•
•
Load Balanced Model
Next Steps - Distributed Model
DocValues
Lessons Learnt
Why not use an existing solution?
38. Lessons Learnt
Indexing
• Reuse index writers, field and document instances
• Create many partitions and merge them in a different
process
• Rebuild (bootstrap) entire index if possible
• Use partial updates with caution
• Analyze the index
39. Lessons Learnt
Serving
• Reuse a single instance of IndexSearcher
• Limit usage of stored fields and term vectors
• Plan for load balancing and failover
• Cache term frequencies
• Use different machines for serving and indexing
40. • Architecture
– Indexer Architecture
– Serving Architecture
•
•
•
•
•
Load Balanced Model
Next Steps - Distributed Model
DocValues
Lessons Learnt
Why not use an existing solution?
41. Why not use existing solutions?
• Doesn’t
allow
dynamic
schema
• Difficult
to
bootstrap
indexes
built
in
Hadoop
• Indexing
elevates
query
latency
•
•
•
•
Doesn’t
allow
dynamic
schema
Difficult
to
bootstrap
indexes
built
in
Hadoop
Larger
memory
overhead
Compara3vely
slow