Slides explaining thatsfordinner.com, a fast recipe browser for maximal inspiration in minimal time. Uses latent Dirichlet allocation (LDA) to group recipes with shared ingredients.
Business deck global xpress solutions 14thsep15Anshuman Tiwari
This is business deck for our company, which deals into online courier services working intracity as well as intercity, operating in both B2C and C2C model, headquatered at Jaipur, but sharing love and parcels across India
This is a better Power Point for my target audience because it has fewer words, more meaningful pictures, and an opportunity for interaction with a discrimination activity.
Dream Card reputed name in Indian Market for Indian Wedding Invitation Card manufacturing & sales business, but now we spread our business & creativity internationally with a huge range of traditional as well as contemporary Wedding Invitation designs to select from, we can cater to almost every aesthetic taste. Our designs cater to all Indian communities & cultures, be it Hindu, Islamic, Christian or interfaith weddings.Our ten years of creative & sales experience ensures that our customers recieve personalized & efficient service from our well-trained staff. Our endeavor is to make the process of ordering wedding cards from India as simple & hassle-free as possible. We guarantee sale & speedy delivery globally & that too at manufacturers cost.
See more at: https://www.dreamweddingcard.com/
Shakti Ranjan Patra has over 9 years of experience in industrial automation and substation automation testing. He currently works as a Project Lead and Test Lead at ABB GISPL, leading testing of products like RIO600 and PCM600. Previously he worked at Areva T&D and ABB Ltd, gaining experience testing systems from manufacturers like ABB, Areva, and Siemens. He is proficient in protocols like IEC61850, DNP3, and MODBUS and tools like PCM600, COM600, and PACiS.
5LINX United States of America Opportunity Presentation (Spanish)Nick Reinsch
As a 5linx Representative I have the pleasure of helping my family and friends, lower the cost for their products and services used every day!!! CABLE, INTERNET, HOME PHONE, HOME SECURITY SYSTEM & CELL PHONES plus more... Whether it's for your home or business, you can not only save money on products and services used every day, 5LINX offers a home based business opportunity with great tax advantages!!!
I hope you take the time to seriously consider saving money on services you use every day and/or sharing this and developing a growing RECESSION-PROOF business with residual income. By becoming your own BOSS and setting your own hours along with having a better quality of life!!!
I look forward to hearing from you soon.
Nick Reinsch
Text NPVR to 55255 to recieve my virtual business card.
www2.5LINX.net/L707250
Filmology is a full-service video production company that guides clients from concept to broadcast. It offers scriptwriting, videography, editing, and other services to create professional videos for clients in various industries. Filmology prides itself on high production standards and working within client budgets. It has experience in commercials, documentaries, corporate profiles, and other video types.
Capitulo II: Lenguaje, pensamiento y cerebro.Student
This document outlines the terms and conditions for a home loan agreement between a lender and borrower. It details the loan amount, interest rate, repayment schedule, late fees, prepayment options, default conditions, and foreclosure procedures if the borrower fails to meet the obligations of the loan. The lender and borrower must both sign agreeing to these terms to finalize the home loan contract.
State University of New York at Upstate DFRP (.ppt)(3)John C. Farruggio
The Distinguished Faculty Recognition Program (DFRP) at SUNY Upstate Medical University provided voluntary leaves and bonuses to encourage long-serving faculty to retire. Eligible faculty had to have 25+ years of service. 15 faculty from 8 departments elected to participate, taking either a 6-month or 12-month paid leave before retiring. Their combined 547 years of service will result in over $2 million in savings for the university after their retirements. The program required individual counseling sessions and ensured faculty understood retirement benefits and emeritus status.
Business deck global xpress solutions 14thsep15Anshuman Tiwari
This is business deck for our company, which deals into online courier services working intracity as well as intercity, operating in both B2C and C2C model, headquatered at Jaipur, but sharing love and parcels across India
This is a better Power Point for my target audience because it has fewer words, more meaningful pictures, and an opportunity for interaction with a discrimination activity.
Dream Card reputed name in Indian Market for Indian Wedding Invitation Card manufacturing & sales business, but now we spread our business & creativity internationally with a huge range of traditional as well as contemporary Wedding Invitation designs to select from, we can cater to almost every aesthetic taste. Our designs cater to all Indian communities & cultures, be it Hindu, Islamic, Christian or interfaith weddings.Our ten years of creative & sales experience ensures that our customers recieve personalized & efficient service from our well-trained staff. Our endeavor is to make the process of ordering wedding cards from India as simple & hassle-free as possible. We guarantee sale & speedy delivery globally & that too at manufacturers cost.
See more at: https://www.dreamweddingcard.com/
Shakti Ranjan Patra has over 9 years of experience in industrial automation and substation automation testing. He currently works as a Project Lead and Test Lead at ABB GISPL, leading testing of products like RIO600 and PCM600. Previously he worked at Areva T&D and ABB Ltd, gaining experience testing systems from manufacturers like ABB, Areva, and Siemens. He is proficient in protocols like IEC61850, DNP3, and MODBUS and tools like PCM600, COM600, and PACiS.
5LINX United States of America Opportunity Presentation (Spanish)Nick Reinsch
As a 5linx Representative I have the pleasure of helping my family and friends, lower the cost for their products and services used every day!!! CABLE, INTERNET, HOME PHONE, HOME SECURITY SYSTEM & CELL PHONES plus more... Whether it's for your home or business, you can not only save money on products and services used every day, 5LINX offers a home based business opportunity with great tax advantages!!!
I hope you take the time to seriously consider saving money on services you use every day and/or sharing this and developing a growing RECESSION-PROOF business with residual income. By becoming your own BOSS and setting your own hours along with having a better quality of life!!!
I look forward to hearing from you soon.
Nick Reinsch
Text NPVR to 55255 to recieve my virtual business card.
www2.5LINX.net/L707250
Filmology is a full-service video production company that guides clients from concept to broadcast. It offers scriptwriting, videography, editing, and other services to create professional videos for clients in various industries. Filmology prides itself on high production standards and working within client budgets. It has experience in commercials, documentaries, corporate profiles, and other video types.
Capitulo II: Lenguaje, pensamiento y cerebro.Student
This document outlines the terms and conditions for a home loan agreement between a lender and borrower. It details the loan amount, interest rate, repayment schedule, late fees, prepayment options, default conditions, and foreclosure procedures if the borrower fails to meet the obligations of the loan. The lender and borrower must both sign agreeing to these terms to finalize the home loan contract.
State University of New York at Upstate DFRP (.ppt)(3)John C. Farruggio
The Distinguished Faculty Recognition Program (DFRP) at SUNY Upstate Medical University provided voluntary leaves and bonuses to encourage long-serving faculty to retire. Eligible faculty had to have 25+ years of service. 15 faculty from 8 departments elected to participate, taking either a 6-month or 12-month paid leave before retiring. Their combined 547 years of service will result in over $2 million in savings for the university after their retirements. The program required individual counseling sessions and ensured faculty understood retirement benefits and emeritus status.
This document provides instructions for rearing Battus philenor butterflies from egg to adult. It details how to care for newly emerged butterflies, feed them sugar water, get them to mate and lay eggs. Instructions are given for collecting and storing eggs, feeding hatched caterpillars, and housing caterpillars as they grow. The document also covers how to care for pupae and emerging adult butterflies. Maintaining clean rearing containers is emphasized. The goal is to clearly outline the full life cycle and rearing process to successfully breed these butterflies.
Victoria Ryles is seeking a new position that offers challenges and opportunities for progression. She has over 20 years of experience in customer service roles, including as a cabin crew member and hair stylist. She also has experience in education roles, most recently as a project development officer helping young adult carers. Victoria has strong communication, teamwork, and problem-solving skills developed through her diverse work history. She holds qualifications in hairdressing, customer service, teaching, and sociology.
Here are 3 tips to get the most value for your trade-in vehicle:
1. Research the trade-in value of your vehicle online using sites like KBB.com and Cars.com to understand what it's truly worth before negotiating with dealerships.
2. When at a dealership, get a written estimate for your trade-in value before discussing the new vehicle purchase to avoid the dealership lowering the trade-in value to increase their profit.
3. If unable to get your desired price from dealerships, consider private sales of your vehicle online through sites like eBay, Craigslist, and Cars.com where buyers may pay more than dealerships.
This document discusses letting English as a foreign language (EFL) learners teach and talk as a way to demonstrate their learning and engage as teachers. It was written by Nellie Deutsch, who has a doctorate in education and is a mother of two and married, and argues that allowing EFL learners to teach can help engage students and demonstrate their understanding of the material.
This document summarizes a presentation on workers' compensation policies and procedures for SUNY employees. It outlines requirements for reporting workplace injuries, benefits available to injured employees such as paid leave and wage replacement, and policies regarding light duty assignments, disability leave, termination, and reinstatement. The document also includes a case study example and questions to help illustrate how these policies would be applied.
Comcast Creates new Venture Firm with Former CFOSteve Helmholz
Michael Angelakis, Comcast's CFO, is leaving to start a new investment company with $4.1 billion in capital commitments, including $4 billion from Comcast and at least $40 million personally. The new firm will make larger investments than Comcast's existing venture arm, focusing on later-stage businesses globally. Angelakis will receive $8 million annually as CEO of the new company and remain a senior adviser to Comcast during the transition period as it integrates its planned acquisition of Time Warner Cable.
The document summarizes 5 top trends from SmartCitiesWeekDC: 1) Citizen engagement is a priority for smart cities like Copenhagen; 2) Energy/smart city initiatives are popular, such as Copenhagen's carbon neutral goal; 3) Connectivity is important for foundations like LinkNYC which provides free WiFi and services; 4) Digitalization helps cities like Barcelona with 800 wireless access points; 5) Artificial intelligence is emerging like a robot tour guide.
Este documento describe las herramientas RSS y cómo funcionan. Explica que los lectores RSS permiten a los usuarios suscribirse a fuentes en línea para recibir actualizaciones de noticias y contenido de manera automática. Describe tres tipos de lectores RSS: lectores de escritorio, lectores en línea y lectores integrados en navegadores y correo electrónico. El objetivo principal de RSS es proporcionar a los usuarios una forma eficiente de mantenerse al día con las páginas y sitios de noticias que les interesan.
Αναλογία εργαζομένων ελληνικού δημοσίου τομέα σε σχέση με δεκαπέντε πολυεθνικ...stratos goumas
Στην παρουσίαση αυτή θα εξετάσουμε την αναλογία των εργαζομένων στον ελληνικό δημόσιο τομέα σε σχέση με 15 πολυεθνικές εταιρίες. Σκοπός μας είναι να ελέγξουμε, σε γενικές γραμμές, αν το σύνολο των εργαζόμενων στον δημόσιο τομέα επαρκεί για να στελεχώσει τις ανάγκες και τις υπηρεσίες των οργανισμών και τομέων της ελληνικής επικρατείας. Τα στοιχεία που έχουμε συλλέξει αφορούν μόνο τον αριθμό των υπαλλήλων που εργάζονται στο δημόσιο τομέα και τις εταιρίες, χωρίς να έχουμε συμπεριλάβει άλλους δείκτες, όπως παραγωγικότητα, εργατοώρες, μισθοί κτλ. Τα δεδομένα μας είναι ενδεικτικά, μπορούν ωστόσο να χρησιμοποιηθούν για να εξαχθούν μερικά βασικά συμπεράσματα για την δημόσια διοίκηση.
The document provides details about the weekly meetings of the Master Networks chapter, including the date, time, and location of meetings. It explains that the purpose of the meetings is for business networking and training opportunities. It provides information about upcoming events and ways for members to get involved, such as bringing guests, doing face-to-face introductions, and giving or receiving referrals.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
This document provides instructions for rearing Battus philenor butterflies from egg to adult. It details how to care for newly emerged butterflies, feed them sugar water, get them to mate and lay eggs. Instructions are given for collecting and storing eggs, feeding hatched caterpillars, and housing caterpillars as they grow. The document also covers how to care for pupae and emerging adult butterflies. Maintaining clean rearing containers is emphasized. The goal is to clearly outline the full life cycle and rearing process to successfully breed these butterflies.
Victoria Ryles is seeking a new position that offers challenges and opportunities for progression. She has over 20 years of experience in customer service roles, including as a cabin crew member and hair stylist. She also has experience in education roles, most recently as a project development officer helping young adult carers. Victoria has strong communication, teamwork, and problem-solving skills developed through her diverse work history. She holds qualifications in hairdressing, customer service, teaching, and sociology.
Here are 3 tips to get the most value for your trade-in vehicle:
1. Research the trade-in value of your vehicle online using sites like KBB.com and Cars.com to understand what it's truly worth before negotiating with dealerships.
2. When at a dealership, get a written estimate for your trade-in value before discussing the new vehicle purchase to avoid the dealership lowering the trade-in value to increase their profit.
3. If unable to get your desired price from dealerships, consider private sales of your vehicle online through sites like eBay, Craigslist, and Cars.com where buyers may pay more than dealerships.
This document discusses letting English as a foreign language (EFL) learners teach and talk as a way to demonstrate their learning and engage as teachers. It was written by Nellie Deutsch, who has a doctorate in education and is a mother of two and married, and argues that allowing EFL learners to teach can help engage students and demonstrate their understanding of the material.
This document summarizes a presentation on workers' compensation policies and procedures for SUNY employees. It outlines requirements for reporting workplace injuries, benefits available to injured employees such as paid leave and wage replacement, and policies regarding light duty assignments, disability leave, termination, and reinstatement. The document also includes a case study example and questions to help illustrate how these policies would be applied.
Comcast Creates new Venture Firm with Former CFOSteve Helmholz
Michael Angelakis, Comcast's CFO, is leaving to start a new investment company with $4.1 billion in capital commitments, including $4 billion from Comcast and at least $40 million personally. The new firm will make larger investments than Comcast's existing venture arm, focusing on later-stage businesses globally. Angelakis will receive $8 million annually as CEO of the new company and remain a senior adviser to Comcast during the transition period as it integrates its planned acquisition of Time Warner Cable.
The document summarizes 5 top trends from SmartCitiesWeekDC: 1) Citizen engagement is a priority for smart cities like Copenhagen; 2) Energy/smart city initiatives are popular, such as Copenhagen's carbon neutral goal; 3) Connectivity is important for foundations like LinkNYC which provides free WiFi and services; 4) Digitalization helps cities like Barcelona with 800 wireless access points; 5) Artificial intelligence is emerging like a robot tour guide.
Este documento describe las herramientas RSS y cómo funcionan. Explica que los lectores RSS permiten a los usuarios suscribirse a fuentes en línea para recibir actualizaciones de noticias y contenido de manera automática. Describe tres tipos de lectores RSS: lectores de escritorio, lectores en línea y lectores integrados en navegadores y correo electrónico. El objetivo principal de RSS es proporcionar a los usuarios una forma eficiente de mantenerse al día con las páginas y sitios de noticias que les interesan.
Αναλογία εργαζομένων ελληνικού δημοσίου τομέα σε σχέση με δεκαπέντε πολυεθνικ...stratos goumas
Στην παρουσίαση αυτή θα εξετάσουμε την αναλογία των εργαζομένων στον ελληνικό δημόσιο τομέα σε σχέση με 15 πολυεθνικές εταιρίες. Σκοπός μας είναι να ελέγξουμε, σε γενικές γραμμές, αν το σύνολο των εργαζόμενων στον δημόσιο τομέα επαρκεί για να στελεχώσει τις ανάγκες και τις υπηρεσίες των οργανισμών και τομέων της ελληνικής επικρατείας. Τα στοιχεία που έχουμε συλλέξει αφορούν μόνο τον αριθμό των υπαλλήλων που εργάζονται στο δημόσιο τομέα και τις εταιρίες, χωρίς να έχουμε συμπεριλάβει άλλους δείκτες, όπως παραγωγικότητα, εργατοώρες, μισθοί κτλ. Τα δεδομένα μας είναι ενδεικτικά, μπορούν ωστόσο να χρησιμοποιηθούν για να εξαχθούν μερικά βασικά συμπεράσματα για την δημόσια διοίκηση.
The document provides details about the weekly meetings of the Master Networks chapter, including the date, time, and location of meetings. It explains that the purpose of the meetings is for business networking and training opportunities. It provides information about upcoming events and ways for members to get involved, such as bringing guests, doing face-to-face introductions, and giving or receiving referrals.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
3. Data
● 16K recipes scraped from epicurious.com and allrecipes.com.
● Ingredient and title words are the features.
● Typical NLP preprocessing and feature selection.
● Vocabulary size: about 1800 words.
4. Algorithm: Topic model (Latent Dirichlet Allocation)
Potatoes 0.7
Russet 0.1
Gold 0.1
Yukon 0.1
Chili 0.4
Cilantro 0.4
Avocado 0.2
Ginger 0.4
Rice 0.3
Soy 0.3
● Documents are mixtures of topics and words in documents are drawn
from topics:
○ 1 tsp soy sauce, 1 cup rice, 1 piece ginger, 1 chili pepper
○ 3 pounds russet potatoes, 3 pounds Yukon Gold potatoes, 1 stick butter, 1 cup whole
milk, salt
○ 1 avocado, 1 tbsp cilantro, 2 tbsp onion, 1 chili pepper
7. Alexander Jerneck
● Programming enthusiast living inside a social scientist (Ph.D., Sociology).
● I did this project because
○ I want to be able to read everything.
○ I want to make people find new things they didn’t know they wanted.
8.
9. LDA: Generative process
1. For each topic,
a. Draw a distribution over words βk
∼ DirV
(η).
2. For each document,
a. Draw a vector of topic proportions θd
∼ Dir (α).
b. For each word,
i. Draw a topic assignment Zd,n
∼ Mult(θd
), Zd,n
∈ {1,..., K}.
ii. Draw a word Wd,n
∼ Mult(βzd,n
),Wd,n
∈{1,...,V}
V is the size of the vocabulary.
K is the number of topics.
η and α are hyperparameters determining the shape of the Dirichlet distributions.
.
10. LDA: Inference
The posterior:
p(θ1:D
,z1:D,1:N
,β1:K
|w1:D,1:N
,α,η) = p(θ1:D
,z1:D
,β1:K
|w1:D
,α,η)/
∫β1:K
∫θ1:D
∑z
p(θ1:D
,z1:D
,β1:K
|w1:D
,α,η)
is intractable because of the integral in the denominator.
I use collapsed Gibbs sampling, implemented in the python lda package to sample from the
posterior.