This document provides a summary of the Flux of Meme project for the 1st semester deliverable. The project involves fetching geo-located social media data from Twitter, creating clusters of this information, extracting topics from the clusters, and analyzing statistics to create timeline predictions. Initial issues involved limited access to Twitter data and a small percentage of tweets being geo-tagged. The document outlines the software architecture and application lifecycle, and discusses plans to refine the topic extraction algorithm and Twitter data collection.
The document discusses different types of computer networks and network topologies. It describes Personal Area Networks (PAN), Local Area Networks (LAN), Wide Area Networks (WAN), and Metropolitan Area Networks (MAN). It also outlines different network topologies including bus, ring, star, mesh, tree, and hybrid topologies. Key details about each type of network and topology are provided.
The document discusses semantic web mediation, which involves two main steps: 1) providing semantic access to data through the use of ontologies, and 2) the mediation process. It describes applying these concepts to the Personae project, which uses a mediator to provide unified access and querying of distributed semantic web data sources described by different local ontologies. The mediator aligns the local ontologies to a global reference ontology to facilitate query answering across sources.
This is the NetworkedPlanet keynote presentation from the TMRA 2009 conference. The focus of this presentation is on the challenges for the topic maps community
ENHANCING AVAILABILITY FOR DISTRIBUTED REPLICATED SERVICES CONSIDERING NETWOR...IJCNCJournal
Mechanism to improve data or service availability is critical for an enterprise to ensure the quality of service in terms of availability. Replication has been used to improve system availability. The number and location of the replicas are two impact factors on availability. In this paper, we will consider the impact of the node and network edge failures on the availability of replicated data or services. The Effective availability modeling approach is designed and efficient availability computing algorithms are developed to model and compute availability of replicated services for systems with the tree topology. The availability enhancement problem (maximizing the objective function) is transformed to a p-median problem
(minimizing the objective function) through re-define the availability enhancement problem. An efficient
replica allocation algorithm is developed to improve data availability in tree networks, with a runtime
complexity of O(K|V|2 ), where K is the number of replicas and |V| is the number of nodes in the tree network. Finally, experimental studies have been conducted to evaluate how efficient and effective the proposed availability computing algorithm and the availability enhancement algorithm on improving the availability of replicated data or services. The results show that the proposed solutions are efficient and
effective on availability computing and availability enhancement.
La Web 2.0 permite a los usuarios interactuar y colaborar como creadores de contenido en una comunidad virtual, facilitando el compartir información y la interoperabilidad. El docente Mauricio Castellano fue entrevistado sobre la Web 2.0 y respondió de manera correcta a cada pregunta, aunque la sociedad en general tiene un conocimiento superficial sobre la Web 2.0 o la desconoce, lo que afecta su uso.
This document provides a summary of the Flux of Meme project for the 1st semester deliverable. The project involves fetching geo-located social media data from Twitter, creating clusters of this information, extracting topics from the clusters, and analyzing statistics to create timeline predictions. Initial issues involved limited access to Twitter data and a small percentage of tweets being geo-tagged. The document outlines the software architecture and application lifecycle, and discusses plans to refine the topic extraction algorithm and Twitter data collection.
The document discusses different types of computer networks and network topologies. It describes Personal Area Networks (PAN), Local Area Networks (LAN), Wide Area Networks (WAN), and Metropolitan Area Networks (MAN). It also outlines different network topologies including bus, ring, star, mesh, tree, and hybrid topologies. Key details about each type of network and topology are provided.
The document discusses semantic web mediation, which involves two main steps: 1) providing semantic access to data through the use of ontologies, and 2) the mediation process. It describes applying these concepts to the Personae project, which uses a mediator to provide unified access and querying of distributed semantic web data sources described by different local ontologies. The mediator aligns the local ontologies to a global reference ontology to facilitate query answering across sources.
This is the NetworkedPlanet keynote presentation from the TMRA 2009 conference. The focus of this presentation is on the challenges for the topic maps community
ENHANCING AVAILABILITY FOR DISTRIBUTED REPLICATED SERVICES CONSIDERING NETWOR...IJCNCJournal
Mechanism to improve data or service availability is critical for an enterprise to ensure the quality of service in terms of availability. Replication has been used to improve system availability. The number and location of the replicas are two impact factors on availability. In this paper, we will consider the impact of the node and network edge failures on the availability of replicated data or services. The Effective availability modeling approach is designed and efficient availability computing algorithms are developed to model and compute availability of replicated services for systems with the tree topology. The availability enhancement problem (maximizing the objective function) is transformed to a p-median problem
(minimizing the objective function) through re-define the availability enhancement problem. An efficient
replica allocation algorithm is developed to improve data availability in tree networks, with a runtime
complexity of O(K|V|2 ), where K is the number of replicas and |V| is the number of nodes in the tree network. Finally, experimental studies have been conducted to evaluate how efficient and effective the proposed availability computing algorithm and the availability enhancement algorithm on improving the availability of replicated data or services. The results show that the proposed solutions are efficient and
effective on availability computing and availability enhancement.
La Web 2.0 permite a los usuarios interactuar y colaborar como creadores de contenido en una comunidad virtual, facilitando el compartir información y la interoperabilidad. El docente Mauricio Castellano fue entrevistado sobre la Web 2.0 y respondió de manera correcta a cada pregunta, aunque la sociedad en general tiene un conocimiento superficial sobre la Web 2.0 o la desconoce, lo que afecta su uso.
El documento habla sobre virus informáticos, definidos como programas maliciosos que alteran el funcionamiento de una computadora sin el permiso del usuario. Explica que un virus puede replicarse e introducirse en otros programas o áreas de almacenamiento para reproducirse, y que aunque a veces no son dañinos, en muchos casos causan daños significativos al borrar archivos o formatear discos. Finalmente, recomienda mantener software actualizado y original para prevenir infecciones virales.
This short document promotes the creation of presentations using Haiku Deck on SlideShare. It includes a stock photo and brief text suggesting the reader may be inspired to create their own Haiku Deck presentation. The document promotes getting started with Haiku Deck on SlideShare to develop a presentation.
An analysis of each verse on Sonnet 31 by Sir Philip Sidney.
Done by the students of The Oxford School (Panama) Gabriel Samori, Guillaume Lalaurie, Marianne Perez, Oriana Gonzales, Octavio Torres and Eric Chacon.
El documento describe 5 dimensiones que configuran la competencia en lectura crítica: 1) Dimensión textual evidente, 2) Dimensión relación textual, 3) Dimensión enunciativa, 4) Dimensión valorativa y 5) Dimensión sociocultural. Cada dimensión se enfoca en un aspecto diferente de la comprensión y análisis crítico de un texto.
Este documento presenta un plan de clase para enseñar sobre las fábulas a estudiantes de segundo grado. La clase se centrará en identificar los elementos de una fábula como el inicio, nudo y desenlace. La maestra leerá la fábula "Doña Cebra y Doña Jirafa" y luego los estudiantes identificarán estas partes en la fábula "El León y el Ratón". Al final, los estudiantes expresarán si les gustó esta última fábula y por qué.
El documento critica el sistema educativo actual y propone alternativas. Argumenta que el fracaso de los estudiantes a menudo se debe a métodos de enseñanza defectuosos en lugar de la capacidad del estudiante. También sugiere que las calificaciones pueden ser discriminatorias y que la educación debería enfocarse más en la reflexión sobre el futuro y la felicidad del estudiante que en la competencia. Concluye que la educación debería facilitar que los estudiantes vivan vidas felices para crear una sociedad más armoniosa.
La Escuela de Frankfurt surge en Alemania después de la Primera Guerra Mundial debido a que el proletariado no produjo la revolución prevista por Marx. Los intelectuales de izquierda se cuestionaron el papel entre pensamiento autónomo y compromiso político. La Escuela de Frankfurt adoptó la teoría crítica de Marx como una teoría que trataba de la acción y la crítica, estableciendo la teoría crítica. A lo largo de los años, la Escuela cambió sus planteamientos teóricos en respuesta a cambios en el context
This document summarizes a research paper that presents a unified model for predicting the geographic location of Twitter users using both text-based and network-based approaches. The model uses text-based logistic regression on user tweets and mentions to generate initial predictions, which are then combined with a label propagation approach using the Twitter mention network to produce the final predictions, achieving state-of-the-art results on three Twitter geolocation datasets. The approach filters out highly mentioned "celebrity" users who connect locations globally in order to produce more accurate localized predictions.
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...Afshin Rahimi
This document summarizes a research paper on predicting the geographic location of Twitter users using a unified text and network prediction model. The model uses both text-based features from users' tweets as well as the network of mentions between users. It achieves state-of-the-art results on three Twitter geolocation datasets by combining text and network-based predictions through a modified label propagation technique. The unified model outperforms both individual text-based and network-based models for predicting locations.
The document summarizes Vitus Lorenz-Meyer's thesis defense which presented a flexible toolkit called PWHN for scalable instrumentation and data collection in peer-to-peer networks. PWHN extends the MapReduce model to distributed systems by using techniques from peer-to-peer networks to construct an efficient aggregation tree based on key-based routing. This allows arbitrary monitoring programs to be run over large, dynamic networks with minimal overhead.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
- What is Clustering, Honeypots and Density Based Clustering?
- What is Optics Clustering and how is it different than DB Clustering? …and how
can it be used for outlier detection.
- What is so-called soft clustering and how is it different than clustering? …and how
can it be used for outlier detection.
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
This is a presentation we perform internally every quarter as part of our Data Science Brown Bag Series. This presentation was talking about different types of soft clustering techniques - all of which the team currently performs depending on the complexity of the data and the complexity of customer problems. If you are interested in learning more about working with L-3 Data Tactics or interested in working for the L-3 Data Tactics Data Science team please contact us soon! Thank you.
Alberto Morales is seeking a challenging career opportunity utilizing his experience and skills. He has a high school diploma from Los Fresnos High School and a bachelor's degree in Electrical Engineering Physics from the University of Texas Rio Grande Valley. His relevant coursework includes circuits, electronics, electromagnetic theory, and communications. He has skills in technical programs, components, oscillator design, and power electronics. His capstone project involved improving wireless power transfer. Currently he works in customer service while also volunteering in his community.
Social Computing Research with Apache SparkMatthew Rowe
The document discusses social computing research conducted using Apache Spark. It summarizes a project that analyzed the diffusion of language innovations on social media by collecting data from Twitter and Reddit, identifying new terms and variations, and computing frequency and form values over time and across communities using Spark. It also summarizes another project that used Spark to analyze the accuracy of UK web filters by classifying blocked and unblocked URLs and calculating accuracy rates for different internet service providers.
The document outlines the program structure for the second year of engineering studies at the University of Mumbai for semesters 3 and 4. It includes the course codes, names, teaching schemes with credits, and examination schemes for the courses. The core courses cover topics like data structures, databases, algorithms, and computer programming. The document also provides course objectives and outcomes, as well as a detailed syllabus covering concepts like stacks, queues, linked lists, trees, graphs, searching, sorting, and applications of data structures. Assessment includes internal tests and an end semester exam.
Named Entity Recognition using Tweet SegmentationIRJET Journal
This document summarizes three research papers on named entity recognition (NER) in tweets. The first paper describes a system called TwiNER that performs NER on targeted Twitter streams to understand user opinions expressed in tweets about organizations. The second paper studies the challenges of NER in tweets due to their terse nature and presents a distantly supervised approach using Labeled LDA that improves NER performance. The third paper proposes modeling user interests in Twitter by extracting named entities from tweets using an unsupervised segmentation approach to avoid large annotation overhead.
This document describes a new approach called BLOOMS+ for performing contextual ontology alignment of Linked Open Data datasets with an upper ontology. BLOOMS+ leverages contextual information from Wikipedia category hierarchies to compute similarities between concepts in different ontologies. It computes class similarity, contextual similarity between super classes, and an overall similarity to determine equivalence or subsumption relationships between concepts during alignment. The approach is evaluated on aligning several LOD ontologies to the PROTON upper ontology, outperforming existing solutions. Future work involves extending this approach to utilize more contextual sources and enable seamless querying across aligned datasets.
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesMike Linksvayer
This document discusses best practices for developing learning resource metadata vocabularies based on guidelines from the Dublin Core Metadata Initiative. It recommends defining clear use cases, selecting an appropriate domain model, reviewing existing vocabularies to reuse terms, designing detailed metadata records, providing usage guidelines, and engaging relevant communities to ensure long-term stewardship of the vocabulary. The Learning Resource Metadata Initiative (LRMI) could benefit from following these best practices in its development.
The document describes a comparative study of various machine learning and neural network models for detecting abusive language on Twitter. It finds that a bidirectional GRU network trained on word-level features, with a Latent Topic Clustering module, achieves the most accurate results with an F1 score of 0.805 for detecting abusive tweets. Additionally, it explores using context tweets as additional features and finds this improves some models' performance.
How Graph Databases used in Police Department?Samet KILICTAS
This presentation delivers basics of graph concept and graph databases to audience. It clearly explains how graph databases are used with sample use cases from industry and how it can be used for police departments. Questions like "When to use a graph DB?" and "Should I solve a problem with Graph DB?" are answered.
El documento habla sobre virus informáticos, definidos como programas maliciosos que alteran el funcionamiento de una computadora sin el permiso del usuario. Explica que un virus puede replicarse e introducirse en otros programas o áreas de almacenamiento para reproducirse, y que aunque a veces no son dañinos, en muchos casos causan daños significativos al borrar archivos o formatear discos. Finalmente, recomienda mantener software actualizado y original para prevenir infecciones virales.
This short document promotes the creation of presentations using Haiku Deck on SlideShare. It includes a stock photo and brief text suggesting the reader may be inspired to create their own Haiku Deck presentation. The document promotes getting started with Haiku Deck on SlideShare to develop a presentation.
An analysis of each verse on Sonnet 31 by Sir Philip Sidney.
Done by the students of The Oxford School (Panama) Gabriel Samori, Guillaume Lalaurie, Marianne Perez, Oriana Gonzales, Octavio Torres and Eric Chacon.
El documento describe 5 dimensiones que configuran la competencia en lectura crítica: 1) Dimensión textual evidente, 2) Dimensión relación textual, 3) Dimensión enunciativa, 4) Dimensión valorativa y 5) Dimensión sociocultural. Cada dimensión se enfoca en un aspecto diferente de la comprensión y análisis crítico de un texto.
Este documento presenta un plan de clase para enseñar sobre las fábulas a estudiantes de segundo grado. La clase se centrará en identificar los elementos de una fábula como el inicio, nudo y desenlace. La maestra leerá la fábula "Doña Cebra y Doña Jirafa" y luego los estudiantes identificarán estas partes en la fábula "El León y el Ratón". Al final, los estudiantes expresarán si les gustó esta última fábula y por qué.
El documento critica el sistema educativo actual y propone alternativas. Argumenta que el fracaso de los estudiantes a menudo se debe a métodos de enseñanza defectuosos en lugar de la capacidad del estudiante. También sugiere que las calificaciones pueden ser discriminatorias y que la educación debería enfocarse más en la reflexión sobre el futuro y la felicidad del estudiante que en la competencia. Concluye que la educación debería facilitar que los estudiantes vivan vidas felices para crear una sociedad más armoniosa.
La Escuela de Frankfurt surge en Alemania después de la Primera Guerra Mundial debido a que el proletariado no produjo la revolución prevista por Marx. Los intelectuales de izquierda se cuestionaron el papel entre pensamiento autónomo y compromiso político. La Escuela de Frankfurt adoptó la teoría crítica de Marx como una teoría que trataba de la acción y la crítica, estableciendo la teoría crítica. A lo largo de los años, la Escuela cambió sus planteamientos teóricos en respuesta a cambios en el context
This document summarizes a research paper that presents a unified model for predicting the geographic location of Twitter users using both text-based and network-based approaches. The model uses text-based logistic regression on user tweets and mentions to generate initial predictions, which are then combined with a label propagation approach using the Twitter mention network to produce the final predictions, achieving state-of-the-art results on three Twitter geolocation datasets. The approach filters out highly mentioned "celebrity" users who connect locations globally in order to produce more accurate localized predictions.
ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Pre...Afshin Rahimi
This document summarizes a research paper on predicting the geographic location of Twitter users using a unified text and network prediction model. The model uses both text-based features from users' tweets as well as the network of mentions between users. It achieves state-of-the-art results on three Twitter geolocation datasets by combining text and network-based predictions through a modified label propagation technique. The unified model outperforms both individual text-based and network-based models for predicting locations.
The document summarizes Vitus Lorenz-Meyer's thesis defense which presented a flexible toolkit called PWHN for scalable instrumentation and data collection in peer-to-peer networks. PWHN extends the MapReduce model to distributed systems by using techniques from peer-to-peer networks to construct an efficient aggregation tree based on key-based routing. This allows arbitrary monitoring programs to be run over large, dynamic networks with minimal overhead.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
- What is Clustering, Honeypots and Density Based Clustering?
- What is Optics Clustering and how is it different than DB Clustering? …and how
can it be used for outlier detection.
- What is so-called soft clustering and how is it different than clustering? …and how
can it be used for outlier detection.
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
This is a presentation we perform internally every quarter as part of our Data Science Brown Bag Series. This presentation was talking about different types of soft clustering techniques - all of which the team currently performs depending on the complexity of the data and the complexity of customer problems. If you are interested in learning more about working with L-3 Data Tactics or interested in working for the L-3 Data Tactics Data Science team please contact us soon! Thank you.
Alberto Morales is seeking a challenging career opportunity utilizing his experience and skills. He has a high school diploma from Los Fresnos High School and a bachelor's degree in Electrical Engineering Physics from the University of Texas Rio Grande Valley. His relevant coursework includes circuits, electronics, electromagnetic theory, and communications. He has skills in technical programs, components, oscillator design, and power electronics. His capstone project involved improving wireless power transfer. Currently he works in customer service while also volunteering in his community.
Social Computing Research with Apache SparkMatthew Rowe
The document discusses social computing research conducted using Apache Spark. It summarizes a project that analyzed the diffusion of language innovations on social media by collecting data from Twitter and Reddit, identifying new terms and variations, and computing frequency and form values over time and across communities using Spark. It also summarizes another project that used Spark to analyze the accuracy of UK web filters by classifying blocked and unblocked URLs and calculating accuracy rates for different internet service providers.
The document outlines the program structure for the second year of engineering studies at the University of Mumbai for semesters 3 and 4. It includes the course codes, names, teaching schemes with credits, and examination schemes for the courses. The core courses cover topics like data structures, databases, algorithms, and computer programming. The document also provides course objectives and outcomes, as well as a detailed syllabus covering concepts like stacks, queues, linked lists, trees, graphs, searching, sorting, and applications of data structures. Assessment includes internal tests and an end semester exam.
Named Entity Recognition using Tweet SegmentationIRJET Journal
This document summarizes three research papers on named entity recognition (NER) in tweets. The first paper describes a system called TwiNER that performs NER on targeted Twitter streams to understand user opinions expressed in tweets about organizations. The second paper studies the challenges of NER in tweets due to their terse nature and presents a distantly supervised approach using Labeled LDA that improves NER performance. The third paper proposes modeling user interests in Twitter by extracting named entities from tweets using an unsupervised segmentation approach to avoid large annotation overhead.
This document describes a new approach called BLOOMS+ for performing contextual ontology alignment of Linked Open Data datasets with an upper ontology. BLOOMS+ leverages contextual information from Wikipedia category hierarchies to compute similarities between concepts in different ontologies. It computes class similarity, contextual similarity between super classes, and an overall similarity to determine equivalence or subsumption relationships between concepts during alignment. The approach is evaluated on aligning several LOD ontologies to the PROTON upper ontology, outperforming existing solutions. Future work involves extending this approach to utilize more contextual sources and enable seamless querying across aligned datasets.
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesMike Linksvayer
This document discusses best practices for developing learning resource metadata vocabularies based on guidelines from the Dublin Core Metadata Initiative. It recommends defining clear use cases, selecting an appropriate domain model, reviewing existing vocabularies to reuse terms, designing detailed metadata records, providing usage guidelines, and engaging relevant communities to ensure long-term stewardship of the vocabulary. The Learning Resource Metadata Initiative (LRMI) could benefit from following these best practices in its development.
The document describes a comparative study of various machine learning and neural network models for detecting abusive language on Twitter. It finds that a bidirectional GRU network trained on word-level features, with a Latent Topic Clustering module, achieves the most accurate results with an F1 score of 0.805 for detecting abusive tweets. Additionally, it explores using context tweets as additional features and finds this improves some models' performance.
How Graph Databases used in Police Department?Samet KILICTAS
This presentation delivers basics of graph concept and graph databases to audience. It clearly explains how graph databases are used with sample use cases from industry and how it can be used for police departments. Questions like "When to use a graph DB?" and "Should I solve a problem with Graph DB?" are answered.
1) The document discusses the problem of broken links in the Web of Data (also known as the Linked Data cloud). As resources on the web change over time, links between them can become broken when the target resource is removed, moved, or changed.
2) It defines two types of broken links: structurally and semantically broken. A structurally broken link occurs when the representations of the target resource can no longer be retrieved. A semantically broken link occurs when the target resource has changed meaning.
3) The analysis of changes between two versions of DBpedia data showed many resources were moved, removed, or created, demonstrating the broken links problem. Redirect links in DBpedia help trace moved resources.
A technical paper presentation on Evaluation of Deep Learning techniques in S...VarshaR19
"Evaluation of Deep Learning techniques in Sentiment Analysis from Twitter Data" is an IEEE paper that was presented at 2019 International conference on Deep Learning & Machine Learning in Emerging Application. Here is a presentation on that paper which was a part of my college seminar.
This document discusses interlinking in linked data and the challenges of link discovery. It defines interlinking as the degree to which entities representing the same concept are linked to each other. It describes two categories of link discovery frameworks: ontology matching and instance matching. The key challenges of link discovery are computational complexity and selecting an appropriate link specification. Current approaches include domain-specific and universal frameworks, and active learning techniques can help guide selection of optimal link specifications.
This document discusses analyzing email communication networks in open source software projects to study their social structure. The authors extracted email aliases from the Apache mailing lists and built a social network graph of participants based on reply relationships between emails. They analyzed metrics like in-degree, out-degree, and betweenness centrality, finding that core developers tend to have higher values. Communication activity, like betweenness, correlated with development contributions to source code files.
Future trends in information management in 2015 will focus on providing access to all information from all people at all times. This will be achieved through advances in semantic computing that leverage large data sources to find meaning and relationships. Social media will continue to be an important data source, and real-time mining of services like Twitter will provide insights. Translation research will also help overcome language barriers by developing hybrid human-machine systems. Cloud services like Windows Azure will allow ubiquitous access to computing resources and data. Immersive experiences enabled by technologies like Photosynth will begin to merge the digital and physical world.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
1. Exploiting Text and Network Context for Geolocation of Social Media Users
Afshin Rahimi,♥
Duy Vu,♠
Trevor Cohn,♥
and Timothy Baldwin♥
♥ Department of Computing and Information Systems, ♠ Department of Mathematics and Statistics, The University of Melbourne
OVERVIEW
Task: Find the location of Twitter users based on text and net-
work information
Previous Shortcoming: No comparison of text-based and
network-based models, no use of both.
Datasets: 3 Twitter geolocation datasets:
GeoText, Twitter-US, Twitter-World.
Sample Format: userid, text, mention-list, latitude/longitude
YOU ARE WHERE YOUR WORDS SAY YOU ARE
Usage of mountain in U.S.
TEXT-BASED MODEL (LR)
Logistic regression with l1 regularisation
over k-d tree discretisation of latitude/longitude.
130 120 110 100 90 80 70 60
Longitude
25
30
35
40
45
50
Latitude
YOU ARE WHERE YOUR FRIENDS ARE
Most of our online interactions are local.
Twitter mention
NETWORK-BASED MODEL (LP)
Label Propagation in @-mention Network:
• Build an @-mention network.
• Initialise the location of training nodes with their
known location.
• Iteratively update non-training nodes’ location
to the median of their neighbours.
• Converges after 10 iterations.
NETWORK VERSUS TEXT
• For connected users, Network-based models
are more accurate.
• For disconnected users (about 20% of the
nodes), Text-based models are more accurate.
• Solution: Utilise both text and network informa-
tion together!
LABEL PROPAGATION OVER TEXT PREDICTIONS
• Initialise training nodes with their known location
and test nodes with their text-based prediction.
• Iteratively update the location of non-training
nodes to the median of their neighbours.
• Converges after 10 iterations.
• Isolated test nodes will keep their text-based
prediction.
DENVER’S TOP FEATURES
RESULTS
State of the art results over all three datasets!
GEOTEXT TwitterUS TwitterWorld0
100
200
300
400
500
600
MedianErrorinkm
Text-based Method (LR)
Network-based Method (LP)
Hybrid Method (LP-LR)
Wing and Baldrige (2014)
Ahmed et al. (2013)