This talk was given by Bhaskar Ghosh (Senior Director of Engineering, LinkedIn Data Infrastructure), at the Yale Oct 2012 Symposium on Big Data, in honor of Martin Schultz.
The document describes LinkedIn's Segmentation & Targeting Platform, a big data application built on Hadoop. It allows users to define segments of users based on attributes and target them for marketing campaigns. Attributes can be computed from multiple data sources and consolidated. Segments are defined through a self-service portal using SQL-like queries. The platform processes complex queries fast and moves at business speed while handling LinkedIn's massive data volumes.
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]Shirshanka Das
The speaker examines different metadata strategies for modeling metadata, storing metadata, and then scaling the acquisition and refinement of metadata for thousands of metadata authors and producing systems. They dive into the pros and cons of each strategy and in which scenarios they think organizations should deploy them. They explore strategies including generic types versus specific types, crawling versus publish/subscribe, single source of truth versus multiple federated sources of truth, automated classification of data, lineage propagation, and more.
The document provides an introduction to the Semantic Web by defining it in multiple ways: a) as a family of Web standards to make data easier to use and reuse, b) as an upgrade to the current Web enabling more intelligent applications, and c) as a collection of metadata technologies to improve business software adaptability and responsiveness. It notes what the Semantic Web is not (e.g. not a better search engine or tagged HTML) and provides examples of how the Semantic Web could benefit individuals by making their lives simpler and businesses by empowering new capabilities and reducing IT costs through standardized metadata linking. Finally, it discusses some early examples and implementations as well as next steps for exploring and prototyping with Semantic
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
LinkedIn has a large professional network with 360M members. They build data-driven products using members' rich profile data. To do this, they ingest online data into offline systems using Apache Kafka. The data is then processed using Hadoop, Spark, Samza and Cubert to compute features and train models. Results are moved back online using Voldemort and Kafka. For example, People You May Know recommendations are generated by triangle closing in Hadoop and Cubert to count common connections faster. Site speed is monitored in real-time using Samza to join logs from different services.
SharePoint Migrations Pitfalls from the CryptJohn Mongell
This document provides an overview of McGladrey, a large accounting firm, and outlines the agenda for a presentation on SharePoint migrations. The presentation covers the elements of a migration, important pre-migration steps like analysis and validation, testing the migration, and post-migration steps. It emphasizes the importance of thorough planning, documentation, and testing to prevent issues during and after the migration.
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightStampedeCon
Today’s world is awash in data, and organizations are rapidly discovering that putting this data to work is the single most important factor in their ability to remain relevant to hyper-connected consumers. In this session, HP will explore the new trends of this appified, thingified, context-rich world and how HP’s Haven platform can give you an edge over your competition.
The document describes LinkedIn's Segmentation & Targeting Platform, a big data application built on Hadoop. It allows users to define segments of users based on attributes and target them for marketing campaigns. Attributes can be computed from multiple data sources and consolidated. Segments are defined through a self-service portal using SQL-like queries. The platform processes complex queries fast and moves at business speed while handling LinkedIn's massive data volumes.
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]Shirshanka Das
The speaker examines different metadata strategies for modeling metadata, storing metadata, and then scaling the acquisition and refinement of metadata for thousands of metadata authors and producing systems. They dive into the pros and cons of each strategy and in which scenarios they think organizations should deploy them. They explore strategies including generic types versus specific types, crawling versus publish/subscribe, single source of truth versus multiple federated sources of truth, automated classification of data, lineage propagation, and more.
The document provides an introduction to the Semantic Web by defining it in multiple ways: a) as a family of Web standards to make data easier to use and reuse, b) as an upgrade to the current Web enabling more intelligent applications, and c) as a collection of metadata technologies to improve business software adaptability and responsiveness. It notes what the Semantic Web is not (e.g. not a better search engine or tagged HTML) and provides examples of how the Semantic Web could benefit individuals by making their lives simpler and businesses by empowering new capabilities and reducing IT costs through standardized metadata linking. Finally, it discusses some early examples and implementations as well as next steps for exploring and prototyping with Semantic
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
LinkedIn has a large professional network with 360M members. They build data-driven products using members' rich profile data. To do this, they ingest online data into offline systems using Apache Kafka. The data is then processed using Hadoop, Spark, Samza and Cubert to compute features and train models. Results are moved back online using Voldemort and Kafka. For example, People You May Know recommendations are generated by triangle closing in Hadoop and Cubert to count common connections faster. Site speed is monitored in real-time using Samza to join logs from different services.
SharePoint Migrations Pitfalls from the CryptJohn Mongell
This document provides an overview of McGladrey, a large accounting firm, and outlines the agenda for a presentation on SharePoint migrations. The presentation covers the elements of a migration, important pre-migration steps like analysis and validation, testing the migration, and post-migration steps. It emphasizes the importance of thorough planning, documentation, and testing to prevent issues during and after the migration.
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightStampedeCon
Today’s world is awash in data, and organizations are rapidly discovering that putting this data to work is the single most important factor in their ability to remain relevant to hyper-connected consumers. In this session, HP will explore the new trends of this appified, thingified, context-rich world and how HP’s Haven platform can give you an edge over your competition.
Oncrawl elasticsearch meetup france #12Tanguy MOAL
Presentation detailing how Elasticsearch is involved in Oncrawl, a SaaS solution for easy SEO monitoring.
The presentation explains how the application is built, and how it integrates Elasticsearch, a powerful general purpose search engine.
Oncrawl is data centric and elasticsearch is used as an analytics engine rather than a full text search engine.
The application uses Apache Hadoop and Apache Nutch for the crawl pipeline and data analysis.
Oncrawl is a Cogniteev solution.
[This work was presented at SIGMOD'13.]
The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.
The document discusses semantic systems and how they can help solve problems related to integrating different types of systems by facilitating interoperability. It outlines some of the key challenges, such as the lack of tools that are easy for average users while also being powerful enough for experts. The document also discusses different semantic technologies like ontologies, logic programming, and the Semantic Web that could help address these challenges if implemented properly with a focus on integration rather than fragmentation.
Video: https://www.youtube.com/watch?v=Rt2oHibJT4k
Technologies such as Hadoop have addressed the "Volume" problem of Big Data, and technologies such as Spark have recently addressed the "Velocity" problem – but the "Variety" problem is largely unaddressed – there is a lot of manual "data wrangling" to mange data models.
These manual processes do not scale well. Not only is the variety of data increasing, also the rate of change in the data definitions is increasing. We can’t keep up. NoSQL data repositories can handle storage, but we need effective models of the data to fully utilize it.
This talk will present tools and a methodology to manage Big Data Models in a rapidly changing world. This talk covers:
Creating Semantic Metadata Models of Big Data Resources
Graphical UI Tools for Big Data Models
Tools to synchronize Big Data Models and Application Code
Using NoSQL Databases, such as Amazon DynamoDB, with Big Data Models
Using Big Data Models with Hadoop, Storm, Spark, Giraph, and Inference
Using Big Data Models with Machine Learning to generate Predictive Models
Developer Collaborative/Coordination processes using Big Data Models and Git
Managing change – Big Data Models with rapidly changing Data Resources
Hadoop and Neo4j: A Winning Combination for Bioinformaticsosintegrators
This presentation includes an intro to bioinformatics with an emphasis on human genome re-sequencing and how Hadoop and Neo4j can be used together to open striking possibilities.
Optimizing the Data Supply Chain for Data ScienceVital.AI
As we move from the Data Warehouse to the Data Supply Chain, we open our perspective to include the full life cycle of data, from raw material to data product.
To produce data products with the most value, in an efficient and cost effective manner, quality control processes must be put into place at each link in the chain, driven by the requirements of data scientists. With such quality control processes in place, the burden of data scientists to cleanse data – typically 80% of the data scientists’ efforts – can be greatly reduced.
Data Models – including schema, metadata, rules, and provenance – play a crucial role in ensuring an effective Data Supply Chain.
Each Data Supply Chain link must be defined with firm boundaries with clear lines of team responsibility – with Data Models providing the natural borders.
In this talk we will discuss the processes that must be put into place at each link in the Data Supply Chain including perspectives on:
* The definition of Data Supply Chain vs. Data Warehouse
* Tools to create, manage, utilize, and share Data Models
* Tracking Data Provenance
* ETL processes, driven by Data Models
* Collaborative processes across Data Science teams
* Visualization of Data and Data Flow across the Data Supply Chain
* Apache Hadoop and Apache Spark as enabling technologies
* Data Science
* Cross-Organizational Collaboration
* Security
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
Graph analysis employs powerful algorithms to explore and discover relationships in social network, IoT, big data, and complex transaction data. Learn how graph technologies are used in applications such as fraud detection for banking, customer 360, public safety, and manufacturing. This session will provide an overview and demos of graph technologies for Oracle Cloud Services, Oracle Database, NoSQL, Spark and Hadoop, including PGX analytics and PGQL property graph query language.
Presented at Analytics and Data Summit, March 20, 2018
The document discusses a presentation about connecting data and Neo4j. It covers data ecosystems and where different technologies fit, how Neo4j works as a graph database, and building graph-native organizations. It also discusses Neo4j's long term vision of connecting enterprise data and the state of data in 2018. Key points include how data structures have evolved from hierarchies to dynamic knowledge graphs and how different technologies like relational databases and Neo4j are suited for different types of queries and connected data problems.
This document provides an overview of graph databases and their use cases. It begins with definitions of graphs and graph databases. It then gives examples of how graph databases can be used for social networking, network management, and other domains where data is interconnected. It provides Cypher examples for creating and querying graph patterns in a social networking and IT network management scenario. Finally, it discusses the graph database ecosystem and how graphs can be deployed for both online transaction processing and batch processing use cases.
Popular applications like Yelp, Google Maps, and Groupon have provided convenient ways to discover and blueprint all the details of a fun and exciting date. These services provide users with the ability to choose a date based on the type of activity, rating, location, deals offered, and a plethora of other options. With the improvements of mobile devices and applications, all of this planning can even be done right in the palm of our hands. However, when planning out the details of a date, users are constantly switching back and forth between applications in order to find that ideal combination of activity type, rating, location, and deals. This can take a considerable amount of time and be quite frustrating due to the amount of factors to consider and the abundance of options available on the web. Through the use of semantic web technology, everything necessary to plan a perfect date can be combined into one easy to use application. The semantic web provides meaning to every piece of information on the web through the use of ontologies, making knowledge readily available and personally tailored to each person who needs it.
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
This document provides an overview of MetaQL, which allows composing queries across NoSQL, SQL, SPARQL, and Spark databases using a domain model. Key points include:
- MetaQL uses a domain model to define concepts and compose typed queries in code that can execute across different databases.
- This separates concerns and improves developer efficiency over managing schemas and databases separately.
- Examples demonstrate MetaQL queries in graph, path, select, and aggregation formats across SQL, NoSQL, and RDF implementations.
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
Joint webinar with Microsoft and Hortonworns on the power of combining the Hortonworks Data Platform with Microsoft’s ubiquitous Windows, Office, SQL Server, Parallel Data Warehouse, and Azure platform to build the Modern Data Architecture for Big Data.
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.
Bigdata and ai in p2 p industry: Knowledge graph and inferencesfbiganalytics
The document discusses how Puhui Finance, a Chinese P2P lending company, uses big data and AI techniques for risk control. It introduces their Feature Compute Engine, which converts unstructured user data into structured features, and their Knowledge Graph, which connects entities and analyzes relationships. Specific use cases discussed include anti-fraud detection using rules, contact recovery by building phone networks, and detecting high-risk individuals via search engines. Challenges around unstructured data, name disambiguation, reasoning and lack of training data are also covered.
Before jumping straight in to development of such an graph based app, we asked the question that anyone would ask - "what makes it a case for Neo4J? and can you prove it?" Basically de-risking and making a case for management buy in. Further, its more about convincing ourselves as well and hence this comparison.
So this is about that comparison and the white-paper that has resulted from it. It is not the actual project. Source code used to generate the comparison numbers is available on https://github.com/EqualExperts/Apiary-Neo4j-RDBMS-Comparison
Graph Databases - Where Do We Do the Modeling Part?DATAVERSITY
Graph processing and graph databases have been with us for a while. However, since their physical implementations are the same for every database in production (Node connected to node, or triplets), there's a perception that data modeling (and data modelers) have no role on projects where graph databases are used.
This month we'll talk about where graph databases are a best fit in a modern data architecture and where data models add value.
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
This document discusses Hortonworks Data Platform (HDP) for Windows. It includes an agenda for the presentation which covers an introduction to HDP for Windows, integrating HDP with Microsoft tools, and a demo. The document lists the speakers and provides information on Windows support for Hadoop components. It describes what is included in HDP for Windows, such as deployment choices and full interoperability across platforms. Integration with Microsoft tools like SQL Server, Excel, and Power BI is highlighted. A demo of using Excel to interact with HDP is promised.
Learn about IBM's Hadoop offering called BigInsights. We will look at the new features in version 4 (including a discussion on the Open Data Platform), review a couple of customer examples, talk about the overall offering and differentiators, and then provide a brief demonstration on how to get started quickly by creating a new cloud instance, uploading data, and generating a visualization using the built-in spreadsheet tooling called BigSheets.
Challenges in the Design of a Graph Database Benchmark graphdevroom
Graph databases are one of the leading drivers in the emerging, highly heterogeneous landscape of database management systems for non-relational data management and processing. The recent interest and success of graph databases arises mainly from the growing interest in social media analysis and the exploration and mining of relationships in social media data. However, with a graph-based model as a very flexible underlying data model, a graph database can serve a large variety of scenarios from different domains such as travel planning, supply chain management and package routing.
During the past months, many vendors have designed and implemented solutions to satisfy the need to efficiently store, manage and query graph data. However, the solutions are very diverse in terms of the supported graph data model, supported query languages, and APIs. With a growing number of vendors offering graph processing and graph management functionality, there is also an increased need to compare the solutions on a functional level as well as on a performance level with the help of benchmarks. Graph database benchmarking is a challenging task. Already existing graph database benchmarks are limited in their functionality and portability to different graph-based data models and different application domains. Existing benchmarks and the supported workloads are typically based on a proprietary query language and on a specific graph-based data model derived from the mathematical notion of a graph. The variety and lack of standardization with respect to the logical representation of graph data and the retrieval of graph data make it hard to define a portable graph database benchmark. In this talk, we present a proposal and design guideline for a graph database benchmark. Typically, a database benchmark consists of a synthetically generated data set of varying size and varying characteristics and a workload driver. In order to generate graph data sets, we present parameters from graph theory, which influence the characteristics of the generated graph data set. Following, the workload driver issues a set of queries against a well-defined interface of the graph database and gathers relevant performance numbers. We propose a set of performance measures to determine the response time behavior on different workloads and also initial suggestions for typical workloads in graph data scenarios. Our main objective of this session is to open the discussion on graph database benchmarking. We believe that there is a need for a common understanding of different workloads for graph processing from different domains and the definition of a common subset of core graph functionality in order to provide a general-purpose graph database benchmark. We encourage vendors to participate and to contribute with their domain-dependent knowledge and to define a graph database benchmark proposal.
How Semantics Solves Big Data ChallengesDATAVERSITY
Today, organizations want both IT simplicity and innovation, but reliance on traditional databases only leads to more complexity, longer development cycles, and more silos. In fact, organizations report that the #1 impediment to big data success is having too many silos. In this webinar, we will discuss how a new database technology, semantics, solves this problem by providing a new approach to modeling data that focuses on relationships and context, making it easier for data to be understood, searched, and shared. With semantics, world-leading organizations are integrating disparate data faster and easier and building smarter applications with richer analytic capabilities—benefits that we look forward to diving into during the webinar.
Presentation on an overview of LinkedIn data driven products and infrastructure given on 26 Oct 2012 in the big-data symposium given in honor of the retirement of my PhD advisor Dr Martin H. Schultz.
LinkedIn is the world's largest professional network, connecting over 150 million professionals. Its mission is to connect professionals worldwide to make them more productive and successful. LinkedIn generates revenue through hiring solutions that provide recruiting tools to companies, marketing solutions that allow targeted advertising to professionals, and premium subscriptions tailored for individual members.
Oncrawl elasticsearch meetup france #12Tanguy MOAL
Presentation detailing how Elasticsearch is involved in Oncrawl, a SaaS solution for easy SEO monitoring.
The presentation explains how the application is built, and how it integrates Elasticsearch, a powerful general purpose search engine.
Oncrawl is data centric and elasticsearch is used as an analytics engine rather than a full text search engine.
The application uses Apache Hadoop and Apache Nutch for the crawl pipeline and data analysis.
Oncrawl is a Cogniteev solution.
[This work was presented at SIGMOD'13.]
The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.
The document discusses semantic systems and how they can help solve problems related to integrating different types of systems by facilitating interoperability. It outlines some of the key challenges, such as the lack of tools that are easy for average users while also being powerful enough for experts. The document also discusses different semantic technologies like ontologies, logic programming, and the Semantic Web that could help address these challenges if implemented properly with a focus on integration rather than fragmentation.
Video: https://www.youtube.com/watch?v=Rt2oHibJT4k
Technologies such as Hadoop have addressed the "Volume" problem of Big Data, and technologies such as Spark have recently addressed the "Velocity" problem – but the "Variety" problem is largely unaddressed – there is a lot of manual "data wrangling" to mange data models.
These manual processes do not scale well. Not only is the variety of data increasing, also the rate of change in the data definitions is increasing. We can’t keep up. NoSQL data repositories can handle storage, but we need effective models of the data to fully utilize it.
This talk will present tools and a methodology to manage Big Data Models in a rapidly changing world. This talk covers:
Creating Semantic Metadata Models of Big Data Resources
Graphical UI Tools for Big Data Models
Tools to synchronize Big Data Models and Application Code
Using NoSQL Databases, such as Amazon DynamoDB, with Big Data Models
Using Big Data Models with Hadoop, Storm, Spark, Giraph, and Inference
Using Big Data Models with Machine Learning to generate Predictive Models
Developer Collaborative/Coordination processes using Big Data Models and Git
Managing change – Big Data Models with rapidly changing Data Resources
Hadoop and Neo4j: A Winning Combination for Bioinformaticsosintegrators
This presentation includes an intro to bioinformatics with an emphasis on human genome re-sequencing and how Hadoop and Neo4j can be used together to open striking possibilities.
Optimizing the Data Supply Chain for Data ScienceVital.AI
As we move from the Data Warehouse to the Data Supply Chain, we open our perspective to include the full life cycle of data, from raw material to data product.
To produce data products with the most value, in an efficient and cost effective manner, quality control processes must be put into place at each link in the chain, driven by the requirements of data scientists. With such quality control processes in place, the burden of data scientists to cleanse data – typically 80% of the data scientists’ efforts – can be greatly reduced.
Data Models – including schema, metadata, rules, and provenance – play a crucial role in ensuring an effective Data Supply Chain.
Each Data Supply Chain link must be defined with firm boundaries with clear lines of team responsibility – with Data Models providing the natural borders.
In this talk we will discuss the processes that must be put into place at each link in the Data Supply Chain including perspectives on:
* The definition of Data Supply Chain vs. Data Warehouse
* Tools to create, manage, utilize, and share Data Models
* Tracking Data Provenance
* ETL processes, driven by Data Models
* Collaborative processes across Data Science teams
* Visualization of Data and Data Flow across the Data Supply Chain
* Apache Hadoop and Apache Spark as enabling technologies
* Data Science
* Cross-Organizational Collaboration
* Security
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
Graph analysis employs powerful algorithms to explore and discover relationships in social network, IoT, big data, and complex transaction data. Learn how graph technologies are used in applications such as fraud detection for banking, customer 360, public safety, and manufacturing. This session will provide an overview and demos of graph technologies for Oracle Cloud Services, Oracle Database, NoSQL, Spark and Hadoop, including PGX analytics and PGQL property graph query language.
Presented at Analytics and Data Summit, March 20, 2018
The document discusses a presentation about connecting data and Neo4j. It covers data ecosystems and where different technologies fit, how Neo4j works as a graph database, and building graph-native organizations. It also discusses Neo4j's long term vision of connecting enterprise data and the state of data in 2018. Key points include how data structures have evolved from hierarchies to dynamic knowledge graphs and how different technologies like relational databases and Neo4j are suited for different types of queries and connected data problems.
This document provides an overview of graph databases and their use cases. It begins with definitions of graphs and graph databases. It then gives examples of how graph databases can be used for social networking, network management, and other domains where data is interconnected. It provides Cypher examples for creating and querying graph patterns in a social networking and IT network management scenario. Finally, it discusses the graph database ecosystem and how graphs can be deployed for both online transaction processing and batch processing use cases.
Popular applications like Yelp, Google Maps, and Groupon have provided convenient ways to discover and blueprint all the details of a fun and exciting date. These services provide users with the ability to choose a date based on the type of activity, rating, location, deals offered, and a plethora of other options. With the improvements of mobile devices and applications, all of this planning can even be done right in the palm of our hands. However, when planning out the details of a date, users are constantly switching back and forth between applications in order to find that ideal combination of activity type, rating, location, and deals. This can take a considerable amount of time and be quite frustrating due to the amount of factors to consider and the abundance of options available on the web. Through the use of semantic web technology, everything necessary to plan a perfect date can be combined into one easy to use application. The semantic web provides meaning to every piece of information on the web through the use of ontologies, making knowledge readily available and personally tailored to each person who needs it.
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
This document provides an overview of MetaQL, which allows composing queries across NoSQL, SQL, SPARQL, and Spark databases using a domain model. Key points include:
- MetaQL uses a domain model to define concepts and compose typed queries in code that can execute across different databases.
- This separates concerns and improves developer efficiency over managing schemas and databases separately.
- Examples demonstrate MetaQL queries in graph, path, select, and aggregation formats across SQL, NoSQL, and RDF implementations.
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
Joint webinar with Microsoft and Hortonworns on the power of combining the Hortonworks Data Platform with Microsoft’s ubiquitous Windows, Office, SQL Server, Parallel Data Warehouse, and Azure platform to build the Modern Data Architecture for Big Data.
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.
Bigdata and ai in p2 p industry: Knowledge graph and inferencesfbiganalytics
The document discusses how Puhui Finance, a Chinese P2P lending company, uses big data and AI techniques for risk control. It introduces their Feature Compute Engine, which converts unstructured user data into structured features, and their Knowledge Graph, which connects entities and analyzes relationships. Specific use cases discussed include anti-fraud detection using rules, contact recovery by building phone networks, and detecting high-risk individuals via search engines. Challenges around unstructured data, name disambiguation, reasoning and lack of training data are also covered.
Before jumping straight in to development of such an graph based app, we asked the question that anyone would ask - "what makes it a case for Neo4J? and can you prove it?" Basically de-risking and making a case for management buy in. Further, its more about convincing ourselves as well and hence this comparison.
So this is about that comparison and the white-paper that has resulted from it. It is not the actual project. Source code used to generate the comparison numbers is available on https://github.com/EqualExperts/Apiary-Neo4j-RDBMS-Comparison
Graph Databases - Where Do We Do the Modeling Part?DATAVERSITY
Graph processing and graph databases have been with us for a while. However, since their physical implementations are the same for every database in production (Node connected to node, or triplets), there's a perception that data modeling (and data modelers) have no role on projects where graph databases are used.
This month we'll talk about where graph databases are a best fit in a modern data architecture and where data models add value.
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
This document discusses Hortonworks Data Platform (HDP) for Windows. It includes an agenda for the presentation which covers an introduction to HDP for Windows, integrating HDP with Microsoft tools, and a demo. The document lists the speakers and provides information on Windows support for Hadoop components. It describes what is included in HDP for Windows, such as deployment choices and full interoperability across platforms. Integration with Microsoft tools like SQL Server, Excel, and Power BI is highlighted. A demo of using Excel to interact with HDP is promised.
Learn about IBM's Hadoop offering called BigInsights. We will look at the new features in version 4 (including a discussion on the Open Data Platform), review a couple of customer examples, talk about the overall offering and differentiators, and then provide a brief demonstration on how to get started quickly by creating a new cloud instance, uploading data, and generating a visualization using the built-in spreadsheet tooling called BigSheets.
Challenges in the Design of a Graph Database Benchmark graphdevroom
Graph databases are one of the leading drivers in the emerging, highly heterogeneous landscape of database management systems for non-relational data management and processing. The recent interest and success of graph databases arises mainly from the growing interest in social media analysis and the exploration and mining of relationships in social media data. However, with a graph-based model as a very flexible underlying data model, a graph database can serve a large variety of scenarios from different domains such as travel planning, supply chain management and package routing.
During the past months, many vendors have designed and implemented solutions to satisfy the need to efficiently store, manage and query graph data. However, the solutions are very diverse in terms of the supported graph data model, supported query languages, and APIs. With a growing number of vendors offering graph processing and graph management functionality, there is also an increased need to compare the solutions on a functional level as well as on a performance level with the help of benchmarks. Graph database benchmarking is a challenging task. Already existing graph database benchmarks are limited in their functionality and portability to different graph-based data models and different application domains. Existing benchmarks and the supported workloads are typically based on a proprietary query language and on a specific graph-based data model derived from the mathematical notion of a graph. The variety and lack of standardization with respect to the logical representation of graph data and the retrieval of graph data make it hard to define a portable graph database benchmark. In this talk, we present a proposal and design guideline for a graph database benchmark. Typically, a database benchmark consists of a synthetically generated data set of varying size and varying characteristics and a workload driver. In order to generate graph data sets, we present parameters from graph theory, which influence the characteristics of the generated graph data set. Following, the workload driver issues a set of queries against a well-defined interface of the graph database and gathers relevant performance numbers. We propose a set of performance measures to determine the response time behavior on different workloads and also initial suggestions for typical workloads in graph data scenarios. Our main objective of this session is to open the discussion on graph database benchmarking. We believe that there is a need for a common understanding of different workloads for graph processing from different domains and the definition of a common subset of core graph functionality in order to provide a general-purpose graph database benchmark. We encourage vendors to participate and to contribute with their domain-dependent knowledge and to define a graph database benchmark proposal.
How Semantics Solves Big Data ChallengesDATAVERSITY
Today, organizations want both IT simplicity and innovation, but reliance on traditional databases only leads to more complexity, longer development cycles, and more silos. In fact, organizations report that the #1 impediment to big data success is having too many silos. In this webinar, we will discuss how a new database technology, semantics, solves this problem by providing a new approach to modeling data that focuses on relationships and context, making it easier for data to be understood, searched, and shared. With semantics, world-leading organizations are integrating disparate data faster and easier and building smarter applications with richer analytic capabilities—benefits that we look forward to diving into during the webinar.
Presentation on an overview of LinkedIn data driven products and infrastructure given on 26 Oct 2012 in the big-data symposium given in honor of the retirement of my PhD advisor Dr Martin H. Schultz.
LinkedIn is the world's largest professional network, connecting over 150 million professionals. Its mission is to connect professionals worldwide to make them more productive and successful. LinkedIn generates revenue through hiring solutions that provide recruiting tools to companies, marketing solutions that allow targeted advertising to professionals, and premium subscriptions tailored for individual members.
LinkedIn provides the world's largest professional network, connecting over 161 million professionals. Its mission is to connect professionals worldwide to help them be more productive and successful. LinkedIn generates revenue through hiring solutions, marketing solutions, and premium subscriptions that provide valuable tools to professionals and opportunities for companies.
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
The Briefing Room with Robin Bloor and Teradata
Live Webcast on Jan. 29, 2013
Despite its name, effective Data Science requires a certain amount of artistic flair. Analysts must be creative about how and where they find the insights that will drive business value. One classic roadblock to that kind of frictionless process? Programming. Not everyone can code Java, which makes the unstructured domain of Hadoop quite challenging for the average business analyst.
Check out the slides from this episode of the Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how a new generation of analytical platforms will solve the complexity of unifying structured and unstructured data. He'll be briefed by Steve Wooledge of Teradata Aster who will tout his company's Big Data Appliance, which leverages the SQL-H bridge, an innovation designed to connect Hadoop with SQL.
Visit: http://www.insideanalysis.com
The Briefing Room with Colin White and Composite Software
Live Webcast Feb. 26, 2013
The modern business analyst needs data from all over the place: yes, the data warehouse, but also the Web, big data, production systems, as well as via partners and vendors. In fact, the typical analyst spends more than 50% of the time chasing data, which slows delivery of analytic insights and limits the time available for thorough analysis. Some practitioners refer to this conundrum as "the data problem."
Check out the slides from this episode of The Briefing Room to hear veteran Analyst Colin White of BI Research as he explains why analytical sandboxes and data hubs can be an analyst's best friend. He'll be briefed by Bob Eve of Composite Software who will discuss his company's mature data virtualization platform, which includes a number of capabilities that help organizations leverage agile analytics. He will discuss why time-to-insight is fast becoming the battle cry of analysis-driven organizations.
Visit: http://www.insideanalysis.com
Great data leads to great insights which leads to great products.
Vitaly Gordon, senior products data scientist, talks about the culture, people and tools that have helped LinkedIn become the world’s leading professional social network and one of the most visited sites on the web.
This document discusses different analytics tools for marketing and advertising requirements. It compares paid vs free tools and outlines key factors to consider such as business type, legal risks, integration capabilities, service and support offerings. The panel then provides examples from Budget Direct's experience using Omniture tools for cross-channel campaign measurement and leveraging customer data insights. Integration of tools and a focus on innovation is highlighted as important for maximizing ROI and marketing effectiveness.
When Worlds Collide: Intelligence, Analytics and OperationsInside Analysis
The Briefing Room with Shawn Rogers and Composite Software
Slides from the Live Webcast on May 15, 2012
Everyone wants more data these days, though often for different reasons. Business analysts, data scientists and front-line workers all know the value of having that extra piece of information. The big question remains -- how can all these needs be supported without taxing IT and without breaking the bank? And how can the worlds of traditional Business Intelligence, Big Data Analytics and Transaction Systems combine to improve business outcomes?
In this episode of The Briefing Room, veteran Analyst Shawn Rogers of Enterprise Management Associates explains what is needed to take advantage from today's hybrid data ecosystem. He'll be briefed by Bob Eve of Composite Software who will explain how innovative enterprises are using data virtualization to gain insight across these worlds and doing so with greater agility and lower costs.
For more information visit: http://www.insideanalysis.com
Watch us on YouTube: http://www.youtube.com/playlist?list=PL5EE76E2EEEC8CF9E
The document discusses big data and analytics. It notes that expectations for business intelligence are changing as data grows exponentially in volume, velocity, variety and complexity. Big data requires new approaches and tools that can handle unstructured data, scale easily, and perform analytics in real-time. The document provides examples of how various industries like pharmaceuticals, financial services, and manufacturing can gain insights from big data through applications like fraud detection, customer management, and supply chain optimization.
This document discusses the evolution of integrated workforce experiences driven by new technologies and business demands. It describes how Cisco's solutions can connect people, resources, and content to empower employees through personalized communication, collaboration, and learning capabilities. The goal is to drive productivity, growth, and innovation across industries by delivering an integrated user experience through applications and services powered by Cisco technologies.
The document describes the Digital Enterprise Research Institute (DERI) and its work on enabling networked knowledge. DERI aims to link scientific research with industry through fundamental research, technology development, and education. Its goals include exploiting big data and enabling smart cities through removing data silos and leveraging linked open data. DERI is developing technologies like the Semantic Sensor Network ontology, CoAP protocol, and Continuous Query Evaluation over Linked Streams (CQELS) to process sensor data and queries over linked streams and datasets in real-time.
InfoFusion is an information access platform from OpenText that allows users to discover, analyze, and act on information from across an organization. It connects to different data sources, extracts metadata, and provides a unified search index. The roadmap outlines expanding connectors, search and analytics capabilities, and embeddable user interface components over the next three years. It aims to address issues like information silos, complex IT environments, and the need to access both structured and unstructured data.
The Comprehensive Approach: A Unified Information ArchitectureInside Analysis
The Briefing Room with Richard Hackathorn and Teradata
Slides from the Live Webcast on May 29, 2012
The worlds of Business Intelligence (BI) and Big Data Analytics can seem at odds, but only because we have yet to fully experience comprehensive approach to managing big data – a Unified Big Data Architecture. The dynamics continue to change as vendors begin to emphasize the importance of leveraging SQL, engineering and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing.
Register for this episode of The Briefing Room to learn the value of taking a strategic approach for managing big data from veteran BI and data warehouse consultant Richard Hackathorn. He'll be briefed by Chris Twogood of Teradata, who will outline his company's recent advances in bridging the gap between Hadoop and SQL to unlock deeper insights and explain the role of Teradata Aster and SQL-MapReduce as a Discovery Platform for Hadoop environments.
For more information visit: http://www.insideanalysis.com
Watch us on YouTube: http://www.youtube.com/playlist?list=PL5EE76E2EEEC8CF9E
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
Mike Gualtieri, Principal Analyst, Forrester Research, presents at the Big Analytics Roadshow, 2012 in New York City on December 12, 2012
Presentation title: Evaluating Big Data Predictive Analytics Platforms
Abstract: Great. You have Big Data. Now what? You have to analyze it to find game-changing predictive models that you can use to make smart decisions, reduce risk, or deliver breakthrough customer experiences. Big Data Predictive Analytics solutions are software and/or hardware solutions that allow firms to discover, evaluate, optimize, and deploy predictive models by analyzing big data sources. In this session, Forrester Principal Analyst Mike Gualtieri will discuss the key criteria you should use to evaluate Big Data Predictive Analytics platforms to meet your specific needs.
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
Design Considerations For Enterprise Social Networks: Identity, Graphs, Strea...Mike Gotta
Organizations can improve how employees connect to co-workers by understanding the influence design has on participation within social platforms. This session examines key social networking building blocks and how design practices should accommodate multiple networking strategies as employees seek to mobilize their connections to satisfy different work and professional needs.Attendees will gain a better understanding of social networking technology found within social platforms; insight to the cultural aspects of social networks, and how social networking strategies help people cultivate relationships and build social capital they can later leverage to achieve work and professional goals.
Presented at E2.0 Boston June 2012. This version of the deck puts builds on separate slides to display properly on Slideshare.
Presentation for the Greater Salem Chamber of Commerce and Windham Community Development regarding Facebook and Google+ marketing strategies for small business and professionals.
For more information about how your business can best utilize Facebook and Google+ as part of your marketing strategy, please follow us on Facebook at www.facebook.com/108degrees or on Google+ at www.gplus.to/108degrees or contact me for a private consultation.
hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...City University London
This document discusses a multi-disciplinary approach called RED (Research, Envision, Design) for experience design. It emphasizes hypothesis-driven research, capability modeling, and scenario planning to envision solutions. The design process involves concept and product design, workload definition, and user-centered and comparative design. Delivery focuses on proof of concepts, vision demonstrators, and investment cases. Examples discussed include an assisted living innovation platform and projects helping organizations promote digital literacy and envision breakthrough customer experiences.
Similar to A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn (20)
This talk was given by Jun Rao (Staff Software Engineer at LinkedIn) and Sam Shah (Senior Engineering Manager at LinkedIn) at the Analytics@Webscale Technical Conference (June 2013).
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
This talk was given by Hien Luu (Senior Software Engineer at LinkedIn) and Siddharth Anand (Senior Staff Software Engineer at LinkedIn) at the Hadoop Summit (June 2013).
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
This talk was given by Swaroop Jagadish (Staff Software Engineer @ LinkedIn) at the ACM SIGMOD/PODS Conference (June 2013). For the paper written by the LinkedIn Espresso Team, go here:
http://www.slideshare.net/amywtang/espresso-20952131
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Amy W. Tang
This paper, written by the LinkedIn Espresso Team, appeared at the ACM SIGMOD/PODS Conference (June 2013). To see the talk given by Swaroop Jagadish (Staff Software Engineer @ LinkedIn), go here:
http://www.slideshare.net/amywtang/li-espresso-sigmodtalk
This document provides an overview of LinkedIn's data infrastructure. It discusses LinkedIn's large user base and data needs for products like profiles, communications, and recommendations. It describes LinkedIn's data ecosystem with three paradigms for online, nearline and offline data. It then summarizes key parts of LinkedIn's data infrastructure, including Databus for change data capture, Voldemort for distributed key-value storage, Kafka for messaging, and Espresso for distributed data storage. Overall, the document outlines how LinkedIn builds scalable data solutions to power its products and services for its large user base.
This document describes Databus, a system used at LinkedIn for distributed data replication and change data capture. Some key points:
- Databus provides timeline consistency across distributed data systems by applying a logical clock to data changes and using a pull-based model for replication.
- It addresses the challenges of specialization in distributed data systems through standardization, isolation of consumers from sources, and handling slow consumers without impacting fast ones.
- The architecture includes fetchers that extract changes from databases, a relay for buffering changes, log and snapshot stores, and client libraries that allow applications to consume changes.
- Performance is optimized through partitioning, filtering, and scaling of consumers independently of sources. Databus
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
12. LinkedIn Recommendation Engine
Recom- People Jobs Groups
mendation
Entities … Ads
Companies
Searches
be interested in
Referral Center
People Browse
Similar Profiles
Similar Groups
Jobs You May
Jobs Browse
Browse Map
TalentMatch
Similar Jobs
News
Groups
GYML
Events
Map
Map
… and more
Products
A/B
API
Recom-
Behavior Collaborative
mendation Popularity User Feedback
Types
Analysis Filtering
Shared, (R-T) Feature Extraction, Entity (R-T) matching computations
Dynamic, Resolution & Enrichment
Unified Offline data munging (hadoop)
Core
Service
15. LinkedIn Data Infrastructure: Sample Stack
Infra challenges in 3-phase Some off-the-shelf.
ecosystem are diverse, Significant investment in
complex and specific home-grown, deep and
interesting platforms
15