This document discusses the engineering challenges of vertical search engines. It describes how vertical search differs from traditional search by querying structured data instead of free text. It outlines the challenges of retrieving web data at scale, processing unstructured data, building distributed data infrastructures, performing vertical search, conducting data analytics, and implementing computational advertising on structured datasets. The document emphasizes that vertical search at web scale provides many opportunities for research and engineering across computer science fields.
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...Connected Data World
Financial crime prevention is something that affects everyone in one way or another. From the Deutsche Banks of the world to small and medium online merchants, regulations for anti-money laundering, know your customer, and customer due diligence apply.
Failing to comply with such regulations can bring on substantial fines. Even more importantly, it can hurt the bottom line and reputation of businesses, having far-reaching side effects. Complying with such regulations, and actively cracking down on financial crime, however, is not easy.
Cross-referencing interconnected data across various datasets, and trying to apply detection rules and to discover patterns in the data is complicated. It takes expertise, effort, and the right technology to be able to do this efficiently.
A natural and efficient way of looking for patterns and applying rules in troves of interconnected data is to model and view that data as a graph. By modeling data as a graph, and applying graph-based algorithms such as PageRank or Centrality, traversing paths, discovering connections and getting insights becomes possible.
Graphs and graph databases are the fastest-growing area of data management technology for a number of reasons. One of the reasons is because they are a perfect match for use cases involving interconnected data.
Queries that would be very complicated to express and very slow to execute using relational databases or other NoSQL database technology, are feasible using graph databases. With the rise in complexity of modern financial markets, financial crimes require going 4 to 11 levels deep into the account – payment graph: this requires a different solution than either relational or NoSQL databases.
How are organizations such as Alibaba, OpenCorporates, and Visa using graph database technology to not just stay on top of regulation, but be one step ahead in the race against financial crime?
Is it possible to do this in real time?
What do graph query languages have to do with this?
Graph intelligence: the future of data-driven investigationsConnected Data World
RDF and graph databases are on the rise. The performances, flexibility, and scalability of these systems are attracting a large number of organizations struggling with complex and connected data. While the graph approach offers several advantages, finding insights into the enormous volume of data remains a challenge.
In this presentation, we will introduce Graph Intelligence, an advanced combination of human and computer-based intelligence to find insights faster in complex connected datasets. We will explain why we believe this approach is the future for teams of investigators fighting financial crime, national security threats or cyber attacks.
From this presentation, you will learn:
The nature and benefits of the Graph Intelligence approach
How to build a platform leveraging graph technology
Real-life examples of money laundering and financial crimes detection and investigation
AI is transforming the financial media industry – impacting everything from content creation to consumption trends. Clancy Childs, general manager of Dow Jones’ knowledge enablement unit, will share insights into how Dow Jones is reimagining what the news looks like.
Learn how Dow Jones’ knowledge graph platform – powered by Stardog – enables the company to unify structured and unstructured data from a vast range of news sources and deliver cutting-edge insights for customers and partners globally.
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...Connected Data World
Financial crime prevention is something that affects everyone in one way or another. From the Deutsche Banks of the world to small and medium online merchants, regulations for anti-money laundering, know your customer, and customer due diligence apply.
Failing to comply with such regulations can bring on substantial fines. Even more importantly, it can hurt the bottom line and reputation of businesses, having far-reaching side effects. Complying with such regulations, and actively cracking down on financial crime, however, is not easy.
Cross-referencing interconnected data across various datasets, and trying to apply detection rules and to discover patterns in the data is complicated. It takes expertise, effort, and the right technology to be able to do this efficiently.
A natural and efficient way of looking for patterns and applying rules in troves of interconnected data is to model and view that data as a graph. By modeling data as a graph, and applying graph-based algorithms such as PageRank or Centrality, traversing paths, discovering connections and getting insights becomes possible.
Graphs and graph databases are the fastest-growing area of data management technology for a number of reasons. One of the reasons is because they are a perfect match for use cases involving interconnected data.
Queries that would be very complicated to express and very slow to execute using relational databases or other NoSQL database technology, are feasible using graph databases. With the rise in complexity of modern financial markets, financial crimes require going 4 to 11 levels deep into the account – payment graph: this requires a different solution than either relational or NoSQL databases.
How are organizations such as Alibaba, OpenCorporates, and Visa using graph database technology to not just stay on top of regulation, but be one step ahead in the race against financial crime?
Is it possible to do this in real time?
What do graph query languages have to do with this?
Graph intelligence: the future of data-driven investigationsConnected Data World
RDF and graph databases are on the rise. The performances, flexibility, and scalability of these systems are attracting a large number of organizations struggling with complex and connected data. While the graph approach offers several advantages, finding insights into the enormous volume of data remains a challenge.
In this presentation, we will introduce Graph Intelligence, an advanced combination of human and computer-based intelligence to find insights faster in complex connected datasets. We will explain why we believe this approach is the future for teams of investigators fighting financial crime, national security threats or cyber attacks.
From this presentation, you will learn:
The nature and benefits of the Graph Intelligence approach
How to build a platform leveraging graph technology
Real-life examples of money laundering and financial crimes detection and investigation
AI is transforming the financial media industry – impacting everything from content creation to consumption trends. Clancy Childs, general manager of Dow Jones’ knowledge enablement unit, will share insights into how Dow Jones is reimagining what the news looks like.
Learn how Dow Jones’ knowledge graph platform – powered by Stardog – enables the company to unify structured and unstructured data from a vast range of news sources and deliver cutting-edge insights for customers and partners globally.
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYAmit Sheth
Amit Sheth, SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY, Keynote at:
CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002.
Data Cloud - Yury Lifshits - Yahoo! ResearchYury Lifshits
In this talk we address two questions:
1) How to use structured data in web search?
2) How to gather structured data?
For the first question we identify valuable classes of data, present query classes that can benefit from structured data and describe architecture that combines keyword search with structured search.
For the second question we present Data Cloud: An ecosystem of data publishers, search engine (data cloud) and data consumers. We show connection form Data Cloud Strategy to classic notion in economics: network effect in two-sided markets. At the end of the talk an early demo implementation will be presented.
Building Predictive Analytics on Big Data PlatformsOlha Hrytsay
SoftServe Innovation Conference in Austin, Texas 2013
Building Predictive Analytics on Big Data Platforms presented by Olha Hrytsay (BI Consultant) and Serhiy Shelpuk (Lead Data Scientist)
Óscar Méndez - Big data: de la investigación científica a la gestión empresarialFundación Ramón Areces
El 3 de julio de 2014, organizamos en la Fundación Ramón Areces una jornada con el lema 'Big Data: de la investigación científica a la gestión empresarial'. En ella estudiamos los retos y oportunidades del Big data en las ciencias sociales, en la economía y en la gestión empresarial. Entre otros ponentes, acudieron expertos de la London School of Economics, BBVA, Deloite, Universidades de Valencia y Oviedo, el Centro Nacional de Supercomputación...
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Applications of Semantic Technology in the Real World TodayAmit Sheth
Amit Sheth, "Applications of Semantic Technology in the Real World Today," talk given at Semantic Technology Conference, San Jose, CA, March 2005.
This talk reviews real-world applications mainly deployed in financial services industry developed over Semagix Freedom platform described in http://knoesis.org/library/resource.php?id=810 . Technology is based on this patent: "Semantic web and its applications in browsing, searching, profiling, personalization and advertising", http://knoesis.org/library/resource.php?id=843 .
Amit Sheth founded Taalee in 1999, which merged with Voquette in 2002, and then with Semagix in 2004.
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
The Howard Hughes Corporation partnered with 47Lining to develop a managed enterprise data lake based on Amazon S3. The purpose of the managed EDL is to fuse relevant on-premises and third-party data to enable Howard Hughes to answer its most valuable business questions. Their first analysis was a lead-scoring model that uses Amazon Machine Learning (Amazon ML) to predict propensity to purchase high-end real estate. The model is based on a combined set of public and private data sources, including all publicly recorded real estate transactions in the US for the past 35 years. By changing their business process for identifying and qualifying leads to use the results of data-driven analytics from their managed data lake in AWS, Howard Hughes increased the number of identified qualified leads in their pipeline by over 400% and reduced the acquisition cost per lead by more than 10 times. In this session, you will see a practical example of how to use Amazon ML to improve business results, how to architect a data lake with Amazon S3 that fuses on-premises, third-party, and public data sets, and how to train and run an Amazon ML model to attain predictive accuracy.
Real-time big data analytics based on product recommendations case studydeep.bi
We started as an ad network. The challenge was to recommend the best product (out of millions) to the right person in a given moment (thousands of users within a second). We have delivered 5 billion ad views since 24 months. To put it in the scale context: If we would serve 1 ad per second it will take 160 years to serve 5 billion ads.
So we needed a solution. SQL databases did not work. Popular NoSQL databases did not work. Standard data warehouse approaches (pre-aggregations, creating schemas) - did not work too.
Re-thinking all the problems with huge data streams flowing to us every second we have built a complete solution based on open-source technologies and fresh, smart ideas from our engineering team. It is called deep.bi and now we make it available to other companies.
deep.bi lets high-growth companies solve fast data problems by providing scalable, flexible and real-time data collection, enrichment and analytics.
It was built using:
- Node.js - API
- Kafka - collecting and distributing data
- Spark Streaming - ETL, data enrichments
- Druid - real-time analytics
- Cassandra - user events store
- Hadoop + Parquet + Spark - raw data store + ad-hoc queries
We've known for years that data-driven content was a 'thing' when we'd produce simple infographics that shared a few statistics and they'd get easy traction for us online. The game has lifted and consumers are becoming more and more obsessed with data and are now demanding higher quality and more complex data-driven content. The challenge for us now as "T-Shaped" marketers is that there are increasing demands for us to learn new skills to produce this content but we don't have the time to do this amongst the other things we need to be expert at.
This presentation is going to give you specific help on how to produce data-driven content without any programming skill. After watching this presentation you'll have the confidence to build your own data-driven content with the knowledge of:
- blueprints for data-driven content ideas
- scraping tools, frameworks and methodologies
- how to brief in a data scraping project to your in-house team or a freelancer
- how to turn your data into visually appealing content
- channels for promoting data-driven content to ensure it gets traction
Building a Real-Time Geospatial-Aware Recommendation EngineAmazon Web Services
Recommendation engines help your prospects and customers find the most relevant offers and content. In this presentation, you will learn how to use AWS building blocks to build your own location-aware recommendation engine. You’ll see how to store real-time events using Amazon Kinesis and Amazon DynamoDB. See how to easily move data into Amazon Redshift using Kinesis Firehose. As your site or app rises in popularity, you’ll need to track a wider variety of events and scale to handle traffic and usage spikes. Learn architectural patterns for processing large datasets and high-request volume applications.
Data Science, Personalisation & Product managementBhaskar Krishnan
Does Data Matter?
Why are we discussing Data & Data Science?
Why is it relevant to Product Management?
What is Identity?
How do we understand users?
How do we Personalise user experiences?
What is Risk and Trust & Safety?
Big Data Explained - Case study: Website Analyticsdeep.bi
This is an example case study showing what big data can mean for a small website that generates just 5000 visits a day.
It all depends on what we want do get from our assets like website traffic. If we only measure the number of people who visited our site, then we do not need to worry about “big data”. We just have to count total visits (5000 a day, 150 000 monthly).
But by using just the simple measure we know nothing about our visitors / customers. So, it pretty useless.
On the following slides we present what a website owner can gain from advanced website analytics and why big data technologies are recommended.
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Key coverage:
Use of ontologies for semantic interoperability (http://knoesis.org/library/resource.php?id=00277); InfoHarness (http://knoesis.org/library/resource.php?id=00275) and VisualHarness (http://knoesis.org/library/resource.php?id=00267) demonstrate faceted search; MREF - putting metadata on HREF is way ahead of its time (see: http://knoesis.org/library/resource.php?id=00294); multi-ontology query processing in OBSERVER system (http://knoesis.org/library/resource.php?id=00273)
Ranking in Google Since The Advent of The Knowledge GraphBill Slawski
A Two Person Panel Discussion/Presentation by Bill Slawski and Barbara Starr On June 23, 2015
The Lotico Semantic Web of San Diego
The SEO San Diego Meetup
The SEM San Diego Meetup
http://www.meetup.com/InternetMarketingSanDiego/events/222788495/
User experience drives search engines, and hence their results. Search Engine Result Presentation/Placements naturally follow that route.
This means that search results are no longer exclusively based on just ranking criteria. Amongst other critical factors is understanding the notion of 'ordering vs ranking', the impact of context and many others.
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYAmit Sheth
Amit Sheth, SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY, Keynote at:
CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002.
Data Cloud - Yury Lifshits - Yahoo! ResearchYury Lifshits
In this talk we address two questions:
1) How to use structured data in web search?
2) How to gather structured data?
For the first question we identify valuable classes of data, present query classes that can benefit from structured data and describe architecture that combines keyword search with structured search.
For the second question we present Data Cloud: An ecosystem of data publishers, search engine (data cloud) and data consumers. We show connection form Data Cloud Strategy to classic notion in economics: network effect in two-sided markets. At the end of the talk an early demo implementation will be presented.
Building Predictive Analytics on Big Data PlatformsOlha Hrytsay
SoftServe Innovation Conference in Austin, Texas 2013
Building Predictive Analytics on Big Data Platforms presented by Olha Hrytsay (BI Consultant) and Serhiy Shelpuk (Lead Data Scientist)
Óscar Méndez - Big data: de la investigación científica a la gestión empresarialFundación Ramón Areces
El 3 de julio de 2014, organizamos en la Fundación Ramón Areces una jornada con el lema 'Big Data: de la investigación científica a la gestión empresarial'. En ella estudiamos los retos y oportunidades del Big data en las ciencias sociales, en la economía y en la gestión empresarial. Entre otros ponentes, acudieron expertos de la London School of Economics, BBVA, Deloite, Universidades de Valencia y Oviedo, el Centro Nacional de Supercomputación...
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Applications of Semantic Technology in the Real World TodayAmit Sheth
Amit Sheth, "Applications of Semantic Technology in the Real World Today," talk given at Semantic Technology Conference, San Jose, CA, March 2005.
This talk reviews real-world applications mainly deployed in financial services industry developed over Semagix Freedom platform described in http://knoesis.org/library/resource.php?id=810 . Technology is based on this patent: "Semantic web and its applications in browsing, searching, profiling, personalization and advertising", http://knoesis.org/library/resource.php?id=843 .
Amit Sheth founded Taalee in 1999, which merged with Voquette in 2002, and then with Semagix in 2004.
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
The Howard Hughes Corporation partnered with 47Lining to develop a managed enterprise data lake based on Amazon S3. The purpose of the managed EDL is to fuse relevant on-premises and third-party data to enable Howard Hughes to answer its most valuable business questions. Their first analysis was a lead-scoring model that uses Amazon Machine Learning (Amazon ML) to predict propensity to purchase high-end real estate. The model is based on a combined set of public and private data sources, including all publicly recorded real estate transactions in the US for the past 35 years. By changing their business process for identifying and qualifying leads to use the results of data-driven analytics from their managed data lake in AWS, Howard Hughes increased the number of identified qualified leads in their pipeline by over 400% and reduced the acquisition cost per lead by more than 10 times. In this session, you will see a practical example of how to use Amazon ML to improve business results, how to architect a data lake with Amazon S3 that fuses on-premises, third-party, and public data sets, and how to train and run an Amazon ML model to attain predictive accuracy.
Real-time big data analytics based on product recommendations case studydeep.bi
We started as an ad network. The challenge was to recommend the best product (out of millions) to the right person in a given moment (thousands of users within a second). We have delivered 5 billion ad views since 24 months. To put it in the scale context: If we would serve 1 ad per second it will take 160 years to serve 5 billion ads.
So we needed a solution. SQL databases did not work. Popular NoSQL databases did not work. Standard data warehouse approaches (pre-aggregations, creating schemas) - did not work too.
Re-thinking all the problems with huge data streams flowing to us every second we have built a complete solution based on open-source technologies and fresh, smart ideas from our engineering team. It is called deep.bi and now we make it available to other companies.
deep.bi lets high-growth companies solve fast data problems by providing scalable, flexible and real-time data collection, enrichment and analytics.
It was built using:
- Node.js - API
- Kafka - collecting and distributing data
- Spark Streaming - ETL, data enrichments
- Druid - real-time analytics
- Cassandra - user events store
- Hadoop + Parquet + Spark - raw data store + ad-hoc queries
We've known for years that data-driven content was a 'thing' when we'd produce simple infographics that shared a few statistics and they'd get easy traction for us online. The game has lifted and consumers are becoming more and more obsessed with data and are now demanding higher quality and more complex data-driven content. The challenge for us now as "T-Shaped" marketers is that there are increasing demands for us to learn new skills to produce this content but we don't have the time to do this amongst the other things we need to be expert at.
This presentation is going to give you specific help on how to produce data-driven content without any programming skill. After watching this presentation you'll have the confidence to build your own data-driven content with the knowledge of:
- blueprints for data-driven content ideas
- scraping tools, frameworks and methodologies
- how to brief in a data scraping project to your in-house team or a freelancer
- how to turn your data into visually appealing content
- channels for promoting data-driven content to ensure it gets traction
Building a Real-Time Geospatial-Aware Recommendation EngineAmazon Web Services
Recommendation engines help your prospects and customers find the most relevant offers and content. In this presentation, you will learn how to use AWS building blocks to build your own location-aware recommendation engine. You’ll see how to store real-time events using Amazon Kinesis and Amazon DynamoDB. See how to easily move data into Amazon Redshift using Kinesis Firehose. As your site or app rises in popularity, you’ll need to track a wider variety of events and scale to handle traffic and usage spikes. Learn architectural patterns for processing large datasets and high-request volume applications.
Data Science, Personalisation & Product managementBhaskar Krishnan
Does Data Matter?
Why are we discussing Data & Data Science?
Why is it relevant to Product Management?
What is Identity?
How do we understand users?
How do we Personalise user experiences?
What is Risk and Trust & Safety?
Big Data Explained - Case study: Website Analyticsdeep.bi
This is an example case study showing what big data can mean for a small website that generates just 5000 visits a day.
It all depends on what we want do get from our assets like website traffic. If we only measure the number of people who visited our site, then we do not need to worry about “big data”. We just have to count total visits (5000 a day, 150 000 monthly).
But by using just the simple measure we know nothing about our visitors / customers. So, it pretty useless.
On the following slides we present what a website owner can gain from advanced website analytics and why big data technologies are recommended.
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Key coverage:
Use of ontologies for semantic interoperability (http://knoesis.org/library/resource.php?id=00277); InfoHarness (http://knoesis.org/library/resource.php?id=00275) and VisualHarness (http://knoesis.org/library/resource.php?id=00267) demonstrate faceted search; MREF - putting metadata on HREF is way ahead of its time (see: http://knoesis.org/library/resource.php?id=00294); multi-ontology query processing in OBSERVER system (http://knoesis.org/library/resource.php?id=00273)
Ranking in Google Since The Advent of The Knowledge GraphBill Slawski
A Two Person Panel Discussion/Presentation by Bill Slawski and Barbara Starr On June 23, 2015
The Lotico Semantic Web of San Diego
The SEO San Diego Meetup
The SEM San Diego Meetup
http://www.meetup.com/InternetMarketingSanDiego/events/222788495/
User experience drives search engines, and hence their results. Search Engine Result Presentation/Placements naturally follow that route.
This means that search results are no longer exclusively based on just ranking criteria. Amongst other critical factors is understanding the notion of 'ordering vs ranking', the impact of context and many others.
Similar to Engineering challenges in vertical search engines (20)
Trends in Software Development: from Outsourcing to Crowdsourcing and Collabo...ITDogadjaji.com
Prezentacija "Trends in Software Development: from Outsourcing to Crowdsourcing and Collaboration in an Open Environment" koju je Martin Jähn održao na konferenciji DANUB.IT u oktobru 2011. godine.
How to deal with the media without screwing upITDogadjaji.com
Prezentacija "How to deal with the media without screwing up" koju je Mike Butcher održao na How to Web Belgrade konferenciji 16. juna 2011. godine u Beogradu.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
1. +
Engineering Challenges
in Vertical Search Engines
Aleksandar Bradic, Senior Director,
Engineering and R&D, Vast.com
2. +
Introduction
Vertical Search
Search focused on vertical data
Vertical Data – data inherently described by it’s structure:
Items/Properties for sale (Automotive, Real Estate..)
Geographical Data (Neighborhoods, Locations..)
Services (Hotels, Transportation..)
Businesses (Restaurants, Nightlife..)
Events (Concerts, Plays..)
Auction items (Collectibles, Art..)
Metadata (News, Social Data, Reviews..)
…
3. +
Introduction
Vertical Search != Full Text Search
Full Text Search queries:
“Cheap tickets for Broadway shows this week”
“Trendy Restaurants in San Francisco near SoMa”
“3-day trips from NYC to anywhere under $1000”
Vertical Search queries:
“price-sorted results bellow two standard deviations from tickets
category with Broadway as location and date range of 2010-04-11 to
2010-04-18”
“distance-sorted results relative to center of SF/SoMa matching the
appropriate threshold of composite score of user review scores and
historical change in query/review volume”
“total cost-sorted results for all 3-day intervals within next 6 months
combining hotel and airfare price bellow max value of $1000 for all
valid locations”
4. +
Introduction
Vertical Search = search on structured data
Vertical Search at Web-Scale:
Web-Scale datasets
Web-Scale query volumes
Interactive operation
Low latency requirements
Utility maximization across all involved parties
=> loads of fun ! : )
6. +
@Vast.com
Daily processing up to 1Tb of unstructured and semi-
structured Web data
Managing ~150M records operational dataset across multiple
verticals
Handling > 1000 query/sec peak search query loads
We’re hiring ! : )
7. +
Challenges in Vertical Search
Engines
Web Data Retrieval
Unstructured Data
Data Processing Infrastructures
Vertical Search
Data Analytics
Computational Advertising
9. +
Web Data Retrieval
”Deep Web” crawling
Locating Deep Web Content Sources
Selecting Relevant Sources
Estimating Database Size
Understanding Content / Form Detection
Automatic Dispatch of HTML Forms
Predicting content in free text forms
Crawling non-HTML Content
Estimating Query Result Sparsity
URL Generation problem
Query Covering Problem
11. +
Unstructured Data
Unstructured Data – information that does not have a pre-
defined data model
Handling Unstructured Data:
Data Cleaning
Tagging with Metadata
Vertical Classification
Schema Matching
Information Extraction
Ford Focus 2008 Convertible just $7000.. Absolute Beauty !!!!
Ford Focus 2008 Convertible just $7000.. Absolute Beauty !!!!
make model year trim price ???
12. +
Unstructured Data
Information extraction from unstructured, ungrammatical
data
Reference Sets - relational data sets that consist of collection of
known entities with associated common attributes
Reference Set Selection
Reference Set Generation
Record Linkage : Finding “best matching” member of reference
set corresponding post
Challenge : Automatic Generation of Reference Sets
13. +
Data Processing Infrastructures
Infrastructures for continuous processing of unbounded streams
of unstructured data
Information Extraction as part of processing (non-trivial
computation per each processed entry)
Inherently distributed infrastructures - in order to support
performance and scalability
Time-to-site constraints. Ability to process out-of band data.
Support for complex operations on aggregated data (de-
duplication, static ranking, data enrichment, data cleaning/
filtering …)
Support for data archival and off-line analysis
15. +
Data Processing Infrastructures
Distributed Computing Platforms:
Batch-oriented (MapReduce, Hadoop, BigTable, HBase…)
Stream-oriented (Flume, S4, Stream SQL…)
Distributed Data Stores (Dynamo/Cassandra/Riak…)
The curse of CAP Theorem:
It is impossible for a distributed system to simultaneously provide
all three of the following guarantees:
Consistency
Availability
Partition tolerance
16. +
Vertical Search
Large-Scale structured data search
Providing both analytic and canonical set of Information
Retrieval functionalities
Entries are represented in Vector Space Model
Each result is represented as data point – tuple consisting of
appropriate number of fields :
(make, model, year, trim …)
17. +
Vertical Search
Search in Vector Space Model
Resulting subset generation
Sorting as linearization using selected metric
Dynamic subset criteria calculation
Search Result Clustering
“Similar” result search
…
… with up to ~100 ms milliseconds response time
… at 10M+ records in index
… handling 100+ queries/sec/host
18. +
Vertical Search
Faceted Search
fac-et (fas’it) :
1. One of the flat polished surfaces cut on a gemstone or occurring
naturally on a crystal.
2. One of numerous aspects, as of a subject.
Vocabulary problem for faceted data
Facet Design / selection
"the keywords that are assigned by indexers are often at
odds with those tried by searchers.”
Selection of information-distinguishing facet values
User-specific faceted search
Dynamic correlated facet generation
Distributing facet computation
19. +
Data Analytics
Clickstream Data Analysis
Learning from implicit user feedback
Anonymous user clustering
Learning to rank
Inventory/Market Trends
Rare Event detection
Price Prediction
Spam Content detection
20. +
Data Analytics
Challenges:
“Good Deal” detection
Recommendation Systems for Vertical Data with no explicit user
feedback
Accuracy of Automatic Valuation Models
Data-driven feature design
Click Prediction
User Behavior Modeling
21. +
Computational Advertising
The central problem of computational advertising is to find
the "best match" between a given user in a given context and a
suitable advertisement.
ads
ads
search results !
22. +
Computational Advertising
Vertical Search presents an additional challenge in the sense
that any of the actual search results can be “sponsored”
ad ?
ad ?
23. +
Computational Advertising
Central challenge:
Find the “best match” between a given user in a given context
and a suitable advertisement
“best match” – maximizing the value for :
Users
Advertisers
Publishers
Each of the parties has different set of utilities:
Users want relevance
Advertisers want ROI and volume
Publishers want revenue per impression/search
25. +
Computational Advertising
Analytical Aparatus:
Regression Analysis (Linear, Logistic, probit model, High
Dimensional methods)
Game Theory (Nash Equilibria, dominant strategy)
Auction Theory (Vickrey, GSP, VCG…)
Graph Theory (random walks on graphs, graph matching, etc.)
Information Retrieval Techniques (similarity metrics, etc.)
…
26. +
Conclusion
Vertical Search & Analytics at Web Scale == fun !!!
Source of large number of relevant research & engineering
problems !
Opportunity to tackle wide spectra of techniques across all
areas of Computer Science and Engineering !
Jump on the bandwagon ! : )