This document provides an overview of the Word2Vec deep learning technique for generating word embeddings from large text corpora. It begins with an introduction to deep learning applications in biotechnology. The document then covers the traditional one-hot encoding representation of words and its limitations. It introduces Word2Vec as a method to map words to vectors of continuous values such that similar words have similar vectors. Key aspects covered include the skip-gram architecture, negative sampling, and training Word2Vec models on large datasets. Applications to materials science literature are discussed. Finally, potential project ideas involving applying Word2Vec to biological literature and genomes are proposed.
This document provides an overview and introduction to representation learning of text, specifically word vectors. It discusses older techniques like bag-of-words and n-grams, and then introduces modern distributed representations like word2vec's CBOW and Skip-Gram models as well as the GloVe model. The document covers how these models work, are evaluated, and techniques to speed them up like hierarchical softmax and negative sampling.
Deep dive into the world of word vectors. We will cover - Bigram model, Skip-gram, CBOW, GLO. Starting from simplest models, we will journey through key results and ideas in this area.
https://www.meetup.com/Deep-Learning-Bangalore/events/239996690/
This paper aims to develop an effective sentence model using a dynamic convolutional neural network (DCNN) architecture. The DCNN applies 1D convolutions and dynamic k-max pooling to capture syntactic and semantic information from sentences with varying lengths. This allows the model to relate phrases far apart in the input sentence and draw together important features. Experiments show the DCNN approach achieves strong performance on tasks like sentiment analysis of movie reviews and question type classification.
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"GeeksLab Odessa
23.05.15 Одесса. Impact Hub Odessa. Конференция AI&BigData Lab
Mostapha Benhenda (Organizer, Kyiv deep learning meetup)
«Word vector representations and applications» (on ENG)
Word vector representations are functions that map words to vectors, in a way that preserve their meaning. These vectors can then be fed to machine learning algorithms, with broad practical applications, including machine tranlation and sentiment analysis.
Подробнее:
http://geekslab.co/
https://www.facebook.com/GeeksLab.co
https://www.youtube.com/user/GeeksLabVideo
Word embeddings are a technique for converting words into vectors of numbers so that they can be processed by machine learning algorithms. Words with similar meanings are mapped to similar vectors in the vector space. There are two main types of word embedding models: count-based models that use co-occurrence statistics, and prediction-based models like CBOW and skip-gram neural networks that learn embeddings by predicting nearby words. Word embeddings allow words with similar contexts to have similar vector representations, and have applications such as document representation.
The document discusses basic concepts related to continuous functions. It begins with an introduction and motivation for studying continuous functions. Some key reasons mentioned are that continuous functions are needed for integration and as underlying functions in differential equations. The document then provides definitions of limits and continuity in terms of limits. It gives examples of determining limits and continuity for various functions. Contributors to the field like Bolzano, Cauchy, and Weierstrass are also acknowledged. The document concludes with additional definitions of continuity, examples, and discussions of uniform continuity.
This document contains notes from a course on theory of computation taught by Professor Michael Sipser at MIT in Fall 2012. The notes were taken by Holden Lee and cover 25 lectures on topics including finite automata, regular expressions, context-free grammars, pushdown automata, Turing machines, decidability, and complexity theory. In particular, the notes summarize key definitions, theorems, and problems discussed in each lecture, with the overarching goal of understanding what types of problems can and cannot be solved by a computer.
This document summarizes a presentation on using an LSTM neural network to predict bitcoin price movements based on sentiment analysis of twitter data. It describes collecting over 1 million tweets related to bitcoin, representing the words in the tweets as word vectors, training an LSTM model on the vectorized tweet data with sentiment labels, and evaluating whether the predicted sentiment correlates with bitcoin price changes. While the results did not find a relationship between sentiment and price according to this model, improvements are discussed such as using a training set more similar to the actual tweet data.
This document provides an overview and introduction to representation learning of text, specifically word vectors. It discusses older techniques like bag-of-words and n-grams, and then introduces modern distributed representations like word2vec's CBOW and Skip-Gram models as well as the GloVe model. The document covers how these models work, are evaluated, and techniques to speed them up like hierarchical softmax and negative sampling.
Deep dive into the world of word vectors. We will cover - Bigram model, Skip-gram, CBOW, GLO. Starting from simplest models, we will journey through key results and ideas in this area.
https://www.meetup.com/Deep-Learning-Bangalore/events/239996690/
This paper aims to develop an effective sentence model using a dynamic convolutional neural network (DCNN) architecture. The DCNN applies 1D convolutions and dynamic k-max pooling to capture syntactic and semantic information from sentences with varying lengths. This allows the model to relate phrases far apart in the input sentence and draw together important features. Experiments show the DCNN approach achieves strong performance on tasks like sentiment analysis of movie reviews and question type classification.
AI&BigData Lab. Mostapha Benhenda. "Word vector representation and applications"GeeksLab Odessa
23.05.15 Одесса. Impact Hub Odessa. Конференция AI&BigData Lab
Mostapha Benhenda (Organizer, Kyiv deep learning meetup)
«Word vector representations and applications» (on ENG)
Word vector representations are functions that map words to vectors, in a way that preserve their meaning. These vectors can then be fed to machine learning algorithms, with broad practical applications, including machine tranlation and sentiment analysis.
Подробнее:
http://geekslab.co/
https://www.facebook.com/GeeksLab.co
https://www.youtube.com/user/GeeksLabVideo
Word embeddings are a technique for converting words into vectors of numbers so that they can be processed by machine learning algorithms. Words with similar meanings are mapped to similar vectors in the vector space. There are two main types of word embedding models: count-based models that use co-occurrence statistics, and prediction-based models like CBOW and skip-gram neural networks that learn embeddings by predicting nearby words. Word embeddings allow words with similar contexts to have similar vector representations, and have applications such as document representation.
The document discusses basic concepts related to continuous functions. It begins with an introduction and motivation for studying continuous functions. Some key reasons mentioned are that continuous functions are needed for integration and as underlying functions in differential equations. The document then provides definitions of limits and continuity in terms of limits. It gives examples of determining limits and continuity for various functions. Contributors to the field like Bolzano, Cauchy, and Weierstrass are also acknowledged. The document concludes with additional definitions of continuity, examples, and discussions of uniform continuity.
This document contains notes from a course on theory of computation taught by Professor Michael Sipser at MIT in Fall 2012. The notes were taken by Holden Lee and cover 25 lectures on topics including finite automata, regular expressions, context-free grammars, pushdown automata, Turing machines, decidability, and complexity theory. In particular, the notes summarize key definitions, theorems, and problems discussed in each lecture, with the overarching goal of understanding what types of problems can and cannot be solved by a computer.
This document summarizes a presentation on using an LSTM neural network to predict bitcoin price movements based on sentiment analysis of twitter data. It describes collecting over 1 million tweets related to bitcoin, representing the words in the tweets as word vectors, training an LSTM model on the vectorized tweet data with sentiment labels, and evaluating whether the predicted sentiment correlates with bitcoin price changes. While the results did not find a relationship between sentiment and price according to this model, improvements are discussed such as using a training set more similar to the actual tweet data.
The document discusses Particle Swarm Optimization (PSO) algorithms and their application in engineering design optimization. It provides an overview of optimization problems and algorithms. PSO is introduced as an evolutionary computational technique inspired by animal social behavior that can be used to find global optimization solutions. The document outlines the basic steps of the PSO algorithm and how it works by updating particle velocities and positions to track the best solutions. Examples of applications to model fitting and inductor design optimization are provided.
The document describes efficient algorithms for projecting a vector onto the l1-ball (sum of absolute values being less than a threshold). It presents two methods: 1) An exact projection algorithm that runs in expected O(n) time, where n is the dimension. 2) A method for vectors with k perturbed elements outside the l1-ball, which projects in O(k log n) time. It demonstrates these algorithms outperform interior point methods on various learning tasks, providing models with high sparsity.
The document describes efficient algorithms for projecting a vector onto the l1-ball (sum of absolute values) constraint. It presents two methods: 1) An exact projection algorithm that runs in O(n) expected time, where n is the dimension. 2) A method for vectors with k perturbed elements outside the l1-ball, which projects in O(k log n) time. It demonstrates these algorithms outperform interior point methods on various learning tasks, providing models with high sparsity.
Le développement du Web et des réseaux sociaux ou les numérisations massives de documents contribuent à un renouvellement des Sciences Humaines et Sociales, des études des patrimoines littéraires ou culturels, ou encore de la façon dont est exploitée la littérature scientifique en général.
Les humanités numériques, qui croisent diverses disciplines avec l’informatique, posent comme centrales les questions du volume des données, de leur diversité, de leur origine, de leur véracité ou de leur représentativité. Les informations sont véhiculées au sein de « documents » textuels (livres, pages Web, tweets...), audio, vidéo ou multimédia. Ils peuvent comporter des illustrations ou des graphiques.
Appréhender de telles ressources nécessite le développement d'approches informatiques robustes, capables de passer à l’échelle et adaptées à la nature fondamentalement ambiguë et variée des informations manipulées (langage naturel ou images à interpréter, points de vue multiples…).
Si les approches d’apprentissage statistique sont monnaie courante pour des tâches de classification ou d’extraction d’information, elles doivent faire face à des espaces vectoriels creux et de dimension très élevées (plusieurs millions), être en mesure d’exploiter des ressources (par exemple des lexiques ou des thesaurus) et tenir compte ou produire des annotations sémantiques qui devront pouvoir être réutilisées.
Pour faire face à ces enjeux, des infrastructures ont été créées telle HumaNum à l’échelle nationale, DARIAH ou CLARIN à l’échelle européenne et des recommandations établies à l’échelle mondiale telle que la TEI (Text Encoding Initiative). Des plateformes au service de l’information scientifique comme l’équipement d’excellence OpenEdition.org sont une autre brique essentielle pour la préservation et l’accès aux « Big Digital Humanities » mais aussi pour favoriser la reproductibilité et la compréhension des expérimentations et des résultats obtenus.
This document provides a tutorial and survey on attention mechanisms, transformers, BERT, and GPT. It first explains attention in sequence-to-sequence models, including the basic sequence-to-sequence model without attention and with attention. It then explains self-attention and how it allows for computing attention over sequences. The document goes on to describe transformers, which are composed solely of attention modules without recurrence. It explains the encoder and decoder components of transformers, including positional encoding, multi-head attention, and masked multi-head attention. Finally, it introduces BERT and GPT, which are stacks of transformer encoders and decoders, respectively, and explains their characteristics and workings.
Fosdem 2013 petra selmer flexible querying of graph dataPetra Selmer
These are the slides from a talk I presented at the Graph Processing room at FOSDEM 2013, in which I discussed my PhD topic: a query language allowing for the flexible querying of complex paths within graph structured data
This document provides an overview of the Introduction to Algorithms course, including the course modules and motivating problems. It introduces the Document Distance problem, which aims to define metrics to measure the similarity between documents based on word frequencies. It discusses an initial Python program ("docdist1.py") to calculate document distance that runs inefficiently due to quadratic time list concatenation. Profiling identifies this as the bottleneck. The solution is to use list extension, resulting in "docdist3.py". Further optimizations include using a dictionary to count word frequencies in constant time, creating "docdist4.py". The document outlines remaining opportunities like improving the word extraction and sorting algorithms.
[Emnlp] what is glo ve part i - towards data scienceNikhil Jaiswal
This document introduces GloVe (Global Vectors), a method for creating word embeddings that combines global matrix factorization and local context window models. It discusses how global matrix factorization uses singular value decomposition to reduce a term-frequency matrix to learn word vectors from global corpus statistics. It also explains how local context window models like skip-gram and CBOW learn word embeddings by predicting words from a fixed-size window of surrounding context words during training. GloVe aims to learn from both global co-occurrence patterns and local context to generate word vectors.
This document discusses various applications of linear algebra in different fields such as abstract thinking, chemistry, coding theory, cryptography, economics, elimination theory, games, genetics, geometry, graph theory, heat distribution, image compression, linear programming, Markov chains, networking, and sociology. It provides examples of how linear algebra concepts such as systems of linear equations and matrix operations are used in topics like balancing chemical equations, error detection in coding, encryption/decryption, economic models, genetic inheritance, and finding lines and circles in geometry.
This presentation is about how global terminology can evolve without a centralized organisation. The simple idea is, that everybody has to disclose the identity of at least two identifiers for the same think. These local semantic handshakes will have the effect of global terminological alignment.
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
Tutorial at the "Reality of the Semantic Gap in Image Retrieval" tutorial at the first international conference on Semantics And digital Media Technology (SAMT 2006). 6th December 2006.
word embeddings and applications to machine translation and sentiment analysisMostapha Benhenda
This document provides an overview of word embeddings and their applications. It discusses how word embeddings represent words as vectors such that similar words have similar vectors. Applications discussed include machine translation, sentiment analysis, and convolutional neural networks. It also provides an example of the GloVe algorithm for creating word embeddings, which involves building a co-occurrence matrix from text and factorizing the matrix to obtain word vectors.
This document discusses analysis fundamentals for measuring algorithm efficiency. It defines asymptotic analysis and big O notation for describing how a function's runtime grows relative to the input size. Common time complexities like constant, logarithmic, linear, quadratic, and exponential are explained. Examples are given to show how problem sizes and runtimes scale based on these complexity classes when the input or computer speed changes.
This document summarizes numerical methods used in various fields including engineering, crime detection, scientific computing, finding roots, and solving heat equations. It discusses how numerical methods are widely used in engineering to model systems using mathematical equations when analytical solutions are not possible. Examples of applying numerical methods include structural analysis, fluid dynamics, image processing to deblur photos, and algorithms for finding roots of equations and solving differential equations.
This document discusses natural language inference and summarizes the key points as follows:
1. The document describes the problem of natural language inference, which involves classifying the relationship between a premise and hypothesis sentence as entailment, contradiction, or neutral. This is an important problem in natural language processing.
2. The SNLI dataset is introduced as a collection of half a million natural language inference problems used to train and evaluate models.
3. Several approaches for solving the problem are discussed, including using word embeddings, LSTMs, CNNs, and traditional bag-of-words models. Results show LSTMs and CNNs achieve the best performance.
A Probabilistic Attack On NP-Complete ProblemsBrittany Allen
This document discusses reformulating NP-complete problems in terms of continuous mathematics using probability theory. Specifically, it considers the 3-SAT NP-complete problem and introduces new probability variables to represent bit assignments. A cost function is constructed as a sum of clause satisfaction probabilities. Key properties of the cost function are that it is harmonic over subsets of variables and its Hessian has zero diagonal entries. The cost function is always positive inside the problem's domain and achieves its min/max on the boundary. The spectrum of cost function values on vertices corresponds to number of unsatisfied clauses. Overall, the approach reformulates 3-SAT in terms of a harmonic cost function to manipulate solutions without examining them individually.
Our project is about guessing the correct missing
word in a given sentence. To find of guess the missing word
we have two main methods one of them statistical language
modeling, while the other is neural language models.
Statistical language modeling depend on the frequency of the
relation between words and here we use Markov chain. Since
neural language models uses artificial neural networks which
uses deep learning, here we use BERT which is the state of art
in language modeling provided by google.
This document summarizes recent results from Mircea-Dan Hernest's PhD thesis on optimizing proof-theoretic techniques used by Ulrich Kohlenbach for extracting bounds from proofs in analysis. Specifically, it explores adapting Kohlenbach's "light monotone Dialectica" interpretation to proofs involving "non-computational" quantifiers. It describes how ε-arithmetization, elimination of extensionality, and model interpretation can be applied in this "non-computational" setting while maintaining certain restrictions. The goal is to more efficiently extract moduli from a larger class of non-trivial analytical proofs.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The document discusses Particle Swarm Optimization (PSO) algorithms and their application in engineering design optimization. It provides an overview of optimization problems and algorithms. PSO is introduced as an evolutionary computational technique inspired by animal social behavior that can be used to find global optimization solutions. The document outlines the basic steps of the PSO algorithm and how it works by updating particle velocities and positions to track the best solutions. Examples of applications to model fitting and inductor design optimization are provided.
The document describes efficient algorithms for projecting a vector onto the l1-ball (sum of absolute values being less than a threshold). It presents two methods: 1) An exact projection algorithm that runs in expected O(n) time, where n is the dimension. 2) A method for vectors with k perturbed elements outside the l1-ball, which projects in O(k log n) time. It demonstrates these algorithms outperform interior point methods on various learning tasks, providing models with high sparsity.
The document describes efficient algorithms for projecting a vector onto the l1-ball (sum of absolute values) constraint. It presents two methods: 1) An exact projection algorithm that runs in O(n) expected time, where n is the dimension. 2) A method for vectors with k perturbed elements outside the l1-ball, which projects in O(k log n) time. It demonstrates these algorithms outperform interior point methods on various learning tasks, providing models with high sparsity.
Le développement du Web et des réseaux sociaux ou les numérisations massives de documents contribuent à un renouvellement des Sciences Humaines et Sociales, des études des patrimoines littéraires ou culturels, ou encore de la façon dont est exploitée la littérature scientifique en général.
Les humanités numériques, qui croisent diverses disciplines avec l’informatique, posent comme centrales les questions du volume des données, de leur diversité, de leur origine, de leur véracité ou de leur représentativité. Les informations sont véhiculées au sein de « documents » textuels (livres, pages Web, tweets...), audio, vidéo ou multimédia. Ils peuvent comporter des illustrations ou des graphiques.
Appréhender de telles ressources nécessite le développement d'approches informatiques robustes, capables de passer à l’échelle et adaptées à la nature fondamentalement ambiguë et variée des informations manipulées (langage naturel ou images à interpréter, points de vue multiples…).
Si les approches d’apprentissage statistique sont monnaie courante pour des tâches de classification ou d’extraction d’information, elles doivent faire face à des espaces vectoriels creux et de dimension très élevées (plusieurs millions), être en mesure d’exploiter des ressources (par exemple des lexiques ou des thesaurus) et tenir compte ou produire des annotations sémantiques qui devront pouvoir être réutilisées.
Pour faire face à ces enjeux, des infrastructures ont été créées telle HumaNum à l’échelle nationale, DARIAH ou CLARIN à l’échelle européenne et des recommandations établies à l’échelle mondiale telle que la TEI (Text Encoding Initiative). Des plateformes au service de l’information scientifique comme l’équipement d’excellence OpenEdition.org sont une autre brique essentielle pour la préservation et l’accès aux « Big Digital Humanities » mais aussi pour favoriser la reproductibilité et la compréhension des expérimentations et des résultats obtenus.
This document provides a tutorial and survey on attention mechanisms, transformers, BERT, and GPT. It first explains attention in sequence-to-sequence models, including the basic sequence-to-sequence model without attention and with attention. It then explains self-attention and how it allows for computing attention over sequences. The document goes on to describe transformers, which are composed solely of attention modules without recurrence. It explains the encoder and decoder components of transformers, including positional encoding, multi-head attention, and masked multi-head attention. Finally, it introduces BERT and GPT, which are stacks of transformer encoders and decoders, respectively, and explains their characteristics and workings.
Fosdem 2013 petra selmer flexible querying of graph dataPetra Selmer
These are the slides from a talk I presented at the Graph Processing room at FOSDEM 2013, in which I discussed my PhD topic: a query language allowing for the flexible querying of complex paths within graph structured data
This document provides an overview of the Introduction to Algorithms course, including the course modules and motivating problems. It introduces the Document Distance problem, which aims to define metrics to measure the similarity between documents based on word frequencies. It discusses an initial Python program ("docdist1.py") to calculate document distance that runs inefficiently due to quadratic time list concatenation. Profiling identifies this as the bottleneck. The solution is to use list extension, resulting in "docdist3.py". Further optimizations include using a dictionary to count word frequencies in constant time, creating "docdist4.py". The document outlines remaining opportunities like improving the word extraction and sorting algorithms.
[Emnlp] what is glo ve part i - towards data scienceNikhil Jaiswal
This document introduces GloVe (Global Vectors), a method for creating word embeddings that combines global matrix factorization and local context window models. It discusses how global matrix factorization uses singular value decomposition to reduce a term-frequency matrix to learn word vectors from global corpus statistics. It also explains how local context window models like skip-gram and CBOW learn word embeddings by predicting words from a fixed-size window of surrounding context words during training. GloVe aims to learn from both global co-occurrence patterns and local context to generate word vectors.
This document discusses various applications of linear algebra in different fields such as abstract thinking, chemistry, coding theory, cryptography, economics, elimination theory, games, genetics, geometry, graph theory, heat distribution, image compression, linear programming, Markov chains, networking, and sociology. It provides examples of how linear algebra concepts such as systems of linear equations and matrix operations are used in topics like balancing chemical equations, error detection in coding, encryption/decryption, economic models, genetic inheritance, and finding lines and circles in geometry.
This presentation is about how global terminology can evolve without a centralized organisation. The simple idea is, that everybody has to disclose the identity of at least two identifiers for the same think. These local semantic handshakes will have the effect of global terminological alignment.
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
Tutorial at the "Reality of the Semantic Gap in Image Retrieval" tutorial at the first international conference on Semantics And digital Media Technology (SAMT 2006). 6th December 2006.
word embeddings and applications to machine translation and sentiment analysisMostapha Benhenda
This document provides an overview of word embeddings and their applications. It discusses how word embeddings represent words as vectors such that similar words have similar vectors. Applications discussed include machine translation, sentiment analysis, and convolutional neural networks. It also provides an example of the GloVe algorithm for creating word embeddings, which involves building a co-occurrence matrix from text and factorizing the matrix to obtain word vectors.
This document discusses analysis fundamentals for measuring algorithm efficiency. It defines asymptotic analysis and big O notation for describing how a function's runtime grows relative to the input size. Common time complexities like constant, logarithmic, linear, quadratic, and exponential are explained. Examples are given to show how problem sizes and runtimes scale based on these complexity classes when the input or computer speed changes.
This document summarizes numerical methods used in various fields including engineering, crime detection, scientific computing, finding roots, and solving heat equations. It discusses how numerical methods are widely used in engineering to model systems using mathematical equations when analytical solutions are not possible. Examples of applying numerical methods include structural analysis, fluid dynamics, image processing to deblur photos, and algorithms for finding roots of equations and solving differential equations.
This document discusses natural language inference and summarizes the key points as follows:
1. The document describes the problem of natural language inference, which involves classifying the relationship between a premise and hypothesis sentence as entailment, contradiction, or neutral. This is an important problem in natural language processing.
2. The SNLI dataset is introduced as a collection of half a million natural language inference problems used to train and evaluate models.
3. Several approaches for solving the problem are discussed, including using word embeddings, LSTMs, CNNs, and traditional bag-of-words models. Results show LSTMs and CNNs achieve the best performance.
A Probabilistic Attack On NP-Complete ProblemsBrittany Allen
This document discusses reformulating NP-complete problems in terms of continuous mathematics using probability theory. Specifically, it considers the 3-SAT NP-complete problem and introduces new probability variables to represent bit assignments. A cost function is constructed as a sum of clause satisfaction probabilities. Key properties of the cost function are that it is harmonic over subsets of variables and its Hessian has zero diagonal entries. The cost function is always positive inside the problem's domain and achieves its min/max on the boundary. The spectrum of cost function values on vertices corresponds to number of unsatisfied clauses. Overall, the approach reformulates 3-SAT in terms of a harmonic cost function to manipulate solutions without examining them individually.
Our project is about guessing the correct missing
word in a given sentence. To find of guess the missing word
we have two main methods one of them statistical language
modeling, while the other is neural language models.
Statistical language modeling depend on the frequency of the
relation between words and here we use Markov chain. Since
neural language models uses artificial neural networks which
uses deep learning, here we use BERT which is the state of art
in language modeling provided by google.
This document summarizes recent results from Mircea-Dan Hernest's PhD thesis on optimizing proof-theoretic techniques used by Ulrich Kohlenbach for extracting bounds from proofs in analysis. Specifically, it explores adapting Kohlenbach's "light monotone Dialectica" interpretation to proofs involving "non-computational" quantifiers. It describes how ε-arithmetization, elimination of extensionality, and model interpretation can be applied in this "non-computational" setting while maintaining certain restrictions. The goal is to more efficiently extract moduli from a larger class of non-trivial analytical proofs.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Build applications with generative AI on Google Cloud
Lecture1.pptx
1. CS 886 Deep Learning for Biotechnology
Ming Li
Jan. 2022
2. Prelude: our world
What do they have in common?
0
These are all lives encoded by DNA / RNA from 30k bases to 3B bases and more.
Genes encoded by DNA are translated to 1000s’of proteins, hence life.
3. Deep Learning
Since its invention, deep learning has changed many
research fields: speech recognition, image
processing, natural language processing, automatic
driving, industrial control, especially biotechnology
(for example, protein structure prediction). In this
class, we will review applications of deep learning in
biotechnology. The first few lectures will be on the
necessary backgrounds of deep learning.
01. Word2Vec
02. Attention / Transformer
03. Pretraining: GPT and BERT
04. Deep learning applications in proteomics
05. Student presentations begin
5. Things you need to know:
01
Dot product:
a b = ||a||||b||cos(θab)
= a1b1+a2b2+ … +anbn
One can derive Cosine similarity
cos θab = a b / ||a||||b||
Softmax Function:
If we take an input of [1,2,3,4,1,2,3], the softmax of that is
[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175].
The softmax function highlights the largest values and
suppress other values, so that they are all positive and
sum to 1.
Word2Vec
6. Transforming from discrete to “continuous” space
01
1. Calculus (for computing the space covered by a curve)
2. Word2Vec (for computing the “space” covered by the meaning of a
word, or a gene, or a protein)
Word2Vec
7. Traditional representation of a word's meaning
01
1. Dictionary (or PDB), not too useful in computational linguistic research.
2. WordNet (Protein networks)
It is a graph of words, with relationships like “is-a”, synonym sets.
Problems: Depend on human labeling hence missing a lot, hard to
automate this process.
3. These are all using atomic symbols: hotel, motel, equivalent to 1-hot
vector:
Hotel: [0,0,0,0,0,0,0,0,1,0,0,0,0,0]
Motel: [0,0,0,0,1,0,0,0,0,0,0,0,0,0]
These are called one-hot representations. Very long: 13M (google crawl).
Example: inverted index.
Word2Vec
8. Problems.
01
There is no natural meaning of similarity,
hotel motelT = 0
No inherent notion of similarity with 1-hot vectors.
They are very long too. For daily speech, 20 thousand words, 50k for
machine translation, 20 thousand proteins, millions of species. Google web
crawl, 13M words.
Word2Vec
9. How do we solve the problem? This is what Newton and Leibniz did
for calculus:
01 Word2Vec
when #rectangles ∞
10. Let's do something similar:
01
1. Use a lot of “rectangles”, a vector of numbers, to represent to approximate
the meaning of a word.
2. What represents the meanings of a word?
“You shall know a word by the company it keeps” – J.R. Firth, 1957
Thus, our goal is to assign each word a vector such that similar words have
similar vectors (by dot-product).
We will believe JR Firth and use a neural network to train (low dimension) vectors
such that if two words appear together in a text each time, they should get
slightly closer. This allows us to use a massive corpus without annotation! Thus,
we will scan thru training data by looking at a window 2d+1 at a time, given a
center word, trying to predict d words on the left and d words on the right.
Word2Vec
11. To design a neural network for this:
01
More specifically, for a center word wt and “context words” wt’, within a
window of some fixed size say 5 (t’=t-1, … t-5, t+1, … , t+5) we use a neural
network to predict all wt’ to maximize:
p(wt’|wt) = …
This has a loss function
L = 1 – p(wt’|wt)
Thus by looking at many positions in a big corpus, we keep on adjusting
these vectors to minimize this loss, we arrive at a (low dimensional) vector
approximation of the meaning of each word, in the sense that if two words
occur in close proximity often then we consider them similar.
Word2Vec
12. To design a neural network for this:
01
Thus the objective function is: maximize the probability of any context word
given the current center word:
L’(θ) = Π t=1..T Π d=-1..-5,1..5 P(wt+d | wt , θ )
Where θ is all the variables we optimize (i,e, the vectors), T=|training text|.
Taking negative logarithm (and average per word) so that we can minimize
L(θ) = - 1/T Σt=1..T Σd=-1 .. -5, 1,..,5 log P(wt+d | wt )
Then what is P(wt+d | wt)? We can just take their vector dot products, and
then take softmax, to approximate it, letting v be the vector for word w:
L(θ) ≈ - 1/T Σt=1..T Σd=-1 .. -5, 1,..,5 log Softmax (vt+d vt)
Word2Vec
13. To design a neural network for this:
01
Last slide:
L(θ) ≈ - 1/T Σt=1..T Σd=-1 .. -5, 1,..,5 log Softmax (vt+d vt)
The softmax of a center word c, and a context/outside word o
Softmax(vo vc ) = e^(vo vc) / Σk=1..V e^(vk vc)
Note, the index runs over the dictionary of size V, not the whole text T.
Word2Vec
14. Negative Sampling in Word2Vec
01 Word2Vec
• In our L(θ) ≈ - 1/T Σt=1..T Σd=-1 .. -5, 1,..,5 log Softmax (vt+d vt) where
Softmax(vo vc ) = e^(vo vc) / Σk=1..V e^(vk vc)
Each time we have to calculate Σk=1..V e^(vk vc), this is too expensive.
• To overcome this, use negative sampling. The objective function: L(θ) = -1/T Σt=1..T Lt(θ)
Lt(θ) = log σ (uo
Tvc) + Σt=1..k Ej~P(w) [log σ (- uj
Tvc)]
= log σ (uo
Tvc) + Σj~P(w) [log σ (- uj
Tvc)]
• Where the sigmoid function σ(x) = 1/1+e-x, treated as probability for ML people. I.e.
maximize the first term, taking k=10 random samples in the second term.
• For sampling, we can use unigram distribution U(w) or U(w)3/4 for rare words.
15. The skip-gram model
01 Word2Vec
• Vocabulary size: V
• Input layer: center word in 1-hot form.
• k-th row of WVxN is center vector of k-th
word.
• k-th column of W’NXV is context vector
of the k-th word in V. Note, each word has
2 vectors, both randomly initialized.
• The output column yij, i=1..C, has 3 steps
1) Use the context word 1-hot vector to
choose its column in W’NxV
2) dot product with hi the center word
3) compute the softmax
C= context window size
16. The Training of θ
01 Word2Vec
• We will train both WVxN and W’NxV
• I.e. compute all vector gradients.
• Thus θ is in space R2NV, N is vector
size, V is number of words.
• L(θ) / v , for all vectors in θ.
θ = in R2NV
vaardvark
va
.
.
vzebra
uaardvark
ua
.
.
uzebra
17. Gradient Descent
01 Word2Vec
• θnew = θold – α L(θold) / / θold
• Stochastic gradient descent (SGD):
Just do one position (one center
word and its context words) at a time.
θ = in R2NV
vaardvark
va
.
.
vzebra
uaardvark
ua
.
.
uzebra
18. CBOW
01 Word2Vec
• What about we predict center word, given context word, opposite to the skip-gram
model?
• Yes, this is called Continuous Bag Of Words model in the original Word2Vec paper.
22. An interesting application in material science
01 Word2Vec
• Nature, July 2019 V. Tshitonya et al, “Unsupervised word embeddings capture
latent knowledge from materials science literature”.
• Lawrence Berkeley lab material scientists applied word embedding to 3.3
million scientific abstracts published between 1922-2018. V=500k. Vector size:
200 dimension, used skip-gram model
• With no explicit insertion of chemical knowledge
• Captured things like periodic table and structure-property relationship in
materials:
ferromagnetic − NiFe + IrMn ≈ antiferromagnetic
• Discovered new thermoelectric materials: “would be years before their
discovery”.
• Can you do something for proteins?
23. Beyond Word2Vec
01 Word2Vec
• Co-occurrence matrix
Window based co-occurrence
Document based co-occurrence
Ignore the, he, has … frequent words.
Close proximity weigh more …
• In word2vec, if a center word w appears again, we have to repeat this process. Here
they are processed together. Also consider documents.
• Symmetric.
• SVD decomposition, this was before Word2Vec. But it is O(nm2), too slow for large
data.
24. GloVe (Global vectors model)
01 Word2Vec
• Combining Word2Vec and Co-occurrence matrix approaches. Minimize
L(θ) = ½ Σi,j=1..W f(Pi,j) (ui
Tvj – log Pi,j)2
Where, u, v vectors are still the same, Pi,j is the count that ui and vj co-occur.
Essentially
This says, the more ui,vj co-occur, the larger their dot product should be. f gets rid
of too frequent occurrences.
• What about these two vectors? X=U+V works.
25. Summary
01 Word2Vec
• We have learned that representation is important: When you represent space under
a curve by a lot of rectangles, you can approximate the curve, hence calculus;
When you represent a word by a vector, other close-in-meaning words can take
nearby vectors, measured by “cosine” distance.
• When a representation (short vectors) of an object allows similarity measures, we
can easily design neural network (or other approaches) to represent words that are
close in meanings in close vicinities.
26. Literature & Resources for Word2Vec
01
Bengio et al, 2003, A neural probabilistic language model.
Collobert & Weston 2008, NLP (almost) from scratch
Mikolov et al 2013, word2vec paper
Pennington Socher, Manning, GloVe paper, 2014
Rohde et al 2005 (SVD paper) An improved model of semantics similarity
based on lexical co-occurrence.
Plus thousands more.
Resources:
https://mccormickml.com/2016/04/27/word2vec-resources/
https://github.com/clulab/nlp-reading-group/wiki/Word2Vec-Resources
Word2Vec
27. Project Ideas
01
1. Similar to V. Tshitonya et al's work, can you explore all biological
literature and find interesting facts such as protein-protein interaction or
biological name identity resolution?
2. Can we use word-embedding to embed genomes (for example, shatter
genomes into pieces as "words”, but train one vector for each genome)
hence cluster species to build phylogeny to decide their evolutionary
history, similar to [1,2]. You can start with mitochondrial genomes, or
virus genomes (such as different species of covid-19, for this you can
add in other factors such as geographical, time), or bacteria genomes.
1. M. Li, J. Badger, X. Chen, S. Kwong, P. Kearney, H. Zhang, An information-based sequence distance and its application to whole
mitochondrial genome phylogeny, Bioinformatics, 17:2(2001), 149-154.
2. C.H. Bennett, M. Li and B. Ma, Chain letters and evolutionary histories. Scientific American, 288:6(June 2003) (feature article), 76-81.
Word2Vec