A theory from anthropologist and psychologist Robin Dunbar states that the brain capacity of humans limits the number of stable social relationships they can maintain to 150. But what does that mean for B2B organizations with countless contacts?
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
A theory from anthropologist and psychologist Robin Dunbar states that the brain capacity of humans limits the number of stable social relationships they can maintain to 150. But what does that mean for B2B organizations with countless contacts?
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Metrics that matter: Making the business case that documentation has valuePublishing Smarter
Presented at CMS/DITA North America 2016 to help people tell the story around content as a business asset. We agree there is value in documentation but have been challenged at times to “prove it”. Demo of how to present to groups including sales, support, service, IT, engineering, QA/testing, manufacturing, HR, training, finance, marketing, and every other business unit in your organization. Discussion on how documentation drives sales and generates corporate revenue to managers and executives helping them see how important documentation is to them.
Benchmarking the Accounting & Finance Function: 2014 Summary PresentationRobert Half
Is your finance and accounting team ready to drive your success throughout 2014? Robert Half’s fifth annual Benchmarking the Accounting & Finance Function report provides metrics on staffing, financial systems, outsourcing and more. Find out how your company measures up to its peers.
National Centre for Student Equity in Higher Education (NCSEHE) Director Professor Sue Trinidad presents, "Student equity: policy and practice" at the ACER-sponsored Strategies for Student Retention conference held in Melbourne on Tuesday 29 and Wednesday 30 September 2015. Professor Trinidad provides an overview of the NCSEHE's work, including the development of student personas in order to better identify cohorts of students requiring additional support, and strategies with which to assist.
This presentation shows an overview of the main concepts introduced in the EDBT2015 Summer School, which took place in Palamos. For each area, we summarize the main issues and current approaches. We also describe the challenges and main activities that were undertaken in the summer school
Knowledge graphs for knowing more and knowing for sureSteffen Staab
Knowledge graphs have been conceived to collect heterogeneous data and knowledge about large domains, e.g. medical or engineering domains, and to allow versatile access to such collections by means of querying and logical reasoning. A surge of methods has responded to additional requirements in recent years. (i) Knowledge graph embeddings use similarity and analogy of structures to speculatively add to the collected data and knowledge. (ii) Queries with shapes and schema information can be typed to provide certainty about results. We survey both developments and find that the development of techniques happens in disjoint communities that mostly do not understand each other, thus limiting the proper and most versatile use of knowledge graphs.
Reviews on Deep Generative Models in the early days / GANs & VAEs paper reviewchangedaeoh
GAN, VAE등 초창기 생성모델(generative models)들을 리뷰한다.
coverage는 다음과 같다.
- VAE (2013)
- GAN (2014)
- Conditional GAN (2014)
- Conditional VAE (2015)
- Deep Convolutional GAN (2015)
- Information Maximizing GAN (2016)
TAVE research seminar 21.07.06 발표자료
발표자: 오창대
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge GraphsParis Sud University
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge Graphs [F. Saïs, J. E. Gonzales Malaverri and G. Quercini @ACM-SAC-SWA 2020]
This paper deals with the problem of temporal meta-fact genera- tion in RDF knowledge graphs (KGs). These temporal meta-facts represent the time validity of facts, for instance, <Barack Obama, presidentOf, United States of America> is valid for the period [2008..2016]. We propose an approach called MOMENT that com- bines two methods, the first uses a set of specified rules to generate meta-facts in knowledge bases where no temporal meta-fact exist. The second method exploits existing temporal meta-facts and a set of Horn rules generated by AMIE [9] to propagate the meta-facts and thus expand the set of temporal meta-facts. An experimental evaluation has been conducted using Yago, DBpedia and Wikidata datasets. The obtained results are promising and showed the relevance of such an approach for temporal meta-fact generation in Knowledge Graphs.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
The Nucleon Parton Distribution Functions from Lattice QCDChristos Kallidonis
We present results on the nucleon valence quark distribution extracted from Lattice QCD simulations, using a gauge ensemble of $N_f=2+1$ Wilson-Clover fermions with a pion mass of $m_\pi = 350$ MeV and lattice spacing of about $a=0.093$ fm. We obtain reduced Ioffe Time Distributions (rITDs) by computing appropriate matrix elements on the lattice, and elaborate on the extraction of the desired quark distributions from the rITDs following the pseudo-PDF approach. A set of techniques are considered in order to ensure ground state dominance. Theoretical and experimental implications of our calculation are discussed.
The term 'Data Scientist' arose fairly recently to express the specialised recruitment needs of certain well-known data-driven Silicon Valley firms. It signifies a mix of diverse and rare talents, mostly drawing from Computer Science (with emphasis on Big Data), Statistics and Machine Learning. In this talk, we will attempt to briefly survey the state-of-the-art both in terms of problems and solutions at the vanguard of Data Science. We will cover both novel developments, as well as centuries-old best practices, in an attempt to demonstrate that Data Science is indeed a Science, in the full sense of the word. This talk represents part of a seminar series that the speaker has given across the world, including Google (Mountainview), Cisco (San Jose) and Aviva Headquarters (London), and represents joint work with Professor David Hand (OBE).
Metrics that matter: Making the business case that documentation has valuePublishing Smarter
Presented at CMS/DITA North America 2016 to help people tell the story around content as a business asset. We agree there is value in documentation but have been challenged at times to “prove it”. Demo of how to present to groups including sales, support, service, IT, engineering, QA/testing, manufacturing, HR, training, finance, marketing, and every other business unit in your organization. Discussion on how documentation drives sales and generates corporate revenue to managers and executives helping them see how important documentation is to them.
Benchmarking the Accounting & Finance Function: 2014 Summary PresentationRobert Half
Is your finance and accounting team ready to drive your success throughout 2014? Robert Half’s fifth annual Benchmarking the Accounting & Finance Function report provides metrics on staffing, financial systems, outsourcing and more. Find out how your company measures up to its peers.
National Centre for Student Equity in Higher Education (NCSEHE) Director Professor Sue Trinidad presents, "Student equity: policy and practice" at the ACER-sponsored Strategies for Student Retention conference held in Melbourne on Tuesday 29 and Wednesday 30 September 2015. Professor Trinidad provides an overview of the NCSEHE's work, including the development of student personas in order to better identify cohorts of students requiring additional support, and strategies with which to assist.
This presentation shows an overview of the main concepts introduced in the EDBT2015 Summer School, which took place in Palamos. For each area, we summarize the main issues and current approaches. We also describe the challenges and main activities that were undertaken in the summer school
Knowledge graphs for knowing more and knowing for sureSteffen Staab
Knowledge graphs have been conceived to collect heterogeneous data and knowledge about large domains, e.g. medical or engineering domains, and to allow versatile access to such collections by means of querying and logical reasoning. A surge of methods has responded to additional requirements in recent years. (i) Knowledge graph embeddings use similarity and analogy of structures to speculatively add to the collected data and knowledge. (ii) Queries with shapes and schema information can be typed to provide certainty about results. We survey both developments and find that the development of techniques happens in disjoint communities that mostly do not understand each other, thus limiting the proper and most versatile use of knowledge graphs.
Reviews on Deep Generative Models in the early days / GANs & VAEs paper reviewchangedaeoh
GAN, VAE등 초창기 생성모델(generative models)들을 리뷰한다.
coverage는 다음과 같다.
- VAE (2013)
- GAN (2014)
- Conditional GAN (2014)
- Conditional VAE (2015)
- Deep Convolutional GAN (2015)
- Information Maximizing GAN (2016)
TAVE research seminar 21.07.06 발표자료
발표자: 오창대
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge GraphsParis Sud University
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge Graphs [F. Saïs, J. E. Gonzales Malaverri and G. Quercini @ACM-SAC-SWA 2020]
This paper deals with the problem of temporal meta-fact genera- tion in RDF knowledge graphs (KGs). These temporal meta-facts represent the time validity of facts, for instance, <Barack Obama, presidentOf, United States of America> is valid for the period [2008..2016]. We propose an approach called MOMENT that com- bines two methods, the first uses a set of specified rules to generate meta-facts in knowledge bases where no temporal meta-fact exist. The second method exploits existing temporal meta-facts and a set of Horn rules generated by AMIE [9] to propagate the meta-facts and thus expand the set of temporal meta-facts. An experimental evaluation has been conducted using Yago, DBpedia and Wikidata datasets. The obtained results are promising and showed the relevance of such an approach for temporal meta-fact generation in Knowledge Graphs.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
The Nucleon Parton Distribution Functions from Lattice QCDChristos Kallidonis
We present results on the nucleon valence quark distribution extracted from Lattice QCD simulations, using a gauge ensemble of $N_f=2+1$ Wilson-Clover fermions with a pion mass of $m_\pi = 350$ MeV and lattice spacing of about $a=0.093$ fm. We obtain reduced Ioffe Time Distributions (rITDs) by computing appropriate matrix elements on the lattice, and elaborate on the extraction of the desired quark distributions from the rITDs following the pseudo-PDF approach. A set of techniques are considered in order to ensure ground state dominance. Theoretical and experimental implications of our calculation are discussed.
The term 'Data Scientist' arose fairly recently to express the specialised recruitment needs of certain well-known data-driven Silicon Valley firms. It signifies a mix of diverse and rare talents, mostly drawing from Computer Science (with emphasis on Big Data), Statistics and Machine Learning. In this talk, we will attempt to briefly survey the state-of-the-art both in terms of problems and solutions at the vanguard of Data Science. We will cover both novel developments, as well as centuries-old best practices, in an attempt to demonstrate that Data Science is indeed a Science, in the full sense of the word. This talk represents part of a seminar series that the speaker has given across the world, including Google (Mountainview), Cisco (San Jose) and Aviva Headquarters (London), and represents joint work with Professor David Hand (OBE).
Slides from our PacificVis 2015 presentation.
The paper tackles the problems of the “giant hairballs”, the dense and tangled structures often resulting from visualiza- tion of large social graphs. Proposed is a high-dimensional rotation technique called AGI3D, combined with an ability to filter elements based on social centrality values. AGI3D is targeted for a high-dimensional embedding of a social graph and its projection onto 3D space. It allows the user to ro- tate the social graph layout in the high-dimensional space by mouse dragging of a vertex. Its high-dimensional rotation effects give the user an illusion that he/she is destructively reshaping the social graph layout but in reality, it assists the user to find a preferred positioning and direction in the high- dimensional space to look at the internal structure of the social graph layout, keeping it unmodified. A prototype im- plementation of the proposal called Social Viewpoint Finder is tested with about 70 social graphs and this paper reports four of the analysis results.
"data: past, present, and future" day 1 lecture 2020-01-20chris wiggins
What should our future statisticians, senators, and CEOs know about the history and ethics of data? How might understanding that history provide tools and resources to future citizens navigating a future shaped by data empowered algorithms? We've developed a course that introduces students, without prerequisites, to a historical view of our present condition, in which data-empowered algorithms shape our personal, professional, and political realities. The course attempts to integrate critical data studies with functional engagement with data (in Python via Jupyter notebooks), and interleaves an applied view of ethics throughout. The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
a mission-driven approach to personalizing the customer journeychris wiggins
Keynote talk at PyData NYC 2019 by Anne Bauer, Lead Data Scientist, The New York Times, and Chris Wiggins, Chief Data Scientist, The New York Times.
"Data science at The New York Times: a mission-driven approach to personalizing the customer journey"
How does The New York Times use data science to further its mission?
We'll talk about the use of machine learning throughout the company,
from social media promotion to targeted advertising to content
recommendations, and the cross-team collaborations that make it
possible.
Data Science at The New York Times: what industry can learn from us; what we ...chris wiggins
Keynote talk at RSG with DREAM 2019 | November 4-6, 2019 | New York, USA | HOME - RECOMB/ISCB RSG 2019; 12th annual RECOMB/ISCB Conference on Regulatory & Systems Genomics .
pecial Session on Cancer Systems Biology
Regulatory and Systems Genomics 2019 will include an abstract submissions track for a Special Session of Cancer Systems We welcome submissions on computational and experimental advances in the systems-level modeling of cancer. Topics include but are not limited to: regulatory programs and signaling pathways in cancer cells, tumor-immune interactions and the tumor microenvironment, developmental plasticity in tumors and epigenetic analyses, tumor metabolism, genetic and non-genetic sources of heterogeneity, drug response and precision oncology. The session will include presentations from keynote speakers as well as talks from selected abstracts. This special session is sponsored by the Research Center for Cancer Systems Immunology at Memorial Sloan Kettering Cancer Center, an NCI-funded Cancer Systems Biology Consortium (CSBC) Center.
slides uploaded by request
talk presented at the MIDAS seminar, University of Michigan, 2019-04-15. Video available via https://www.youtube.com/watch?v=c7t4LMkq_SU . For more information: https://midas.umich.edu/event/chris-wiggins/
abstract
The Data Science group at The New York Times develops and deploys
machine learning solutions to newsroom and business problems.
Re-framing real-world questions as machine learning tasks requires not
only adapting and extending models and algorithms to new or special
cases but also sufficient breadth to know the right method for the
right challenge. I'll first outline how unsupervised, supervised, and
reinforcement learning methods are increasingly used in human
applications for description, prediction, and prescription,
respectively. I'll then focus on the 'prescriptive' cases, showing how
methods from the reinforcement learning and causal inference
literatures can be of direct impact in engineering, business, and
decision-making more generally.
Talk delivered 2019-06-25 as part of the Summer Institute in Computational Social Science, held at Princeton University https://compsocialscience.github.io/summer-institute/2019/
Video: https://www.youtube.com/watch?v=0suLWheVji0
title:
what should future statisticians, CEO, and senators know about the
history and ethics of data?
abstract:
What should our future statisticians, senators, and CEOs know about the history and ethics of data?
How might understanding that history provide tools and resources to future citizens navigating a future shaped by data empowered algorithms?
I'll present content from a class co-developed over the past several years with Professor Matt Jones of Columbia's Department of History, based on material absent from both the curriculum for future technologists as well as for future humanists.
The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...chris wiggins
Data-empowered algorithms are reshaping our professional, personal, and political realities.
However, existing curricula are predominantly designed either for future technologists, focusing on functional capabilities; or for future humanists, focusing on critical and rhetorical context surrounding data.
"Data: Past, Present, and Future" is a new course at Columbia which seeks to define a curriculum at present taught to neither group, yet of interest and utility to future statisticians, CEOs, and senators alike.
The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
The weekly cadence of the course pairs primary and secondary readings with Jupyter notebooks in Python, engaging directly with the data and intellectual advances under study.
Throughout, these intellectual technical advances are paired with critical inquiry into the forces which encouraged and benefited from these new capabilities, i.e., the political dimension of data and technology.
Syllabus, Jupyter notebooks, and additional info can be found via https://data-ppf.github.io/
"Data: Past, Present, and Future" is supported by the Columbia University Collaboratory Fellows Fund. Jointly founded by Columbia University’s Data Science Institute and Columbia Entrepreneurship, The Collaboratory@Columbia is a university-wide program dedicated to supporting collaborative curricula innovations designed to ensure that all Columbia University students receive the education and training that they need to succeed in today’s data rich world.
Data: Past, Present, and Future (Lecture 1, Spring 2018)chris wiggins
Slides from Lecture 1 of "Data: Past, Present, and Future",
Jan 17 2018.
New class on how data is impacting our professional, political, and personal realities. Taught by Profs Matt Jones and Chris Wiggins
lean + design thinking in building data productschris wiggins
talk given to "Columbia startup teams including 2016 CVC winners, Ignition Grant winners, TFP fellows, and ASCENT fellows" as part of a mini-bootcamp for founders at columbia's engineering school 2016-05-24
data science @NYT ; inaugural Data Science Initiative Lecturechris wiggins
inaugural Data Science Initiative Lecture @ Brown University
2015-12-04
https://www.eventbrite.com/e/data-science-at-the-new-york-times-tickets-19490272931
data science history / data science @ NYTchris wiggins
talk delivered 2015-07-29 at ICERM workshop on "mathematics in data science"
workshop: https://icerm.brown.edu/topical_workshops/tw15-6-mds/
references: http://bit.ly/icerm
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
variational bayes in biophysics
1. bio+stats vbem/networks hierarchical
variational and hierarchical modeling
for biological data
chris wiggins
columbia
april 23, 2012
chris.wiggins@columbia.edu 4/23/12
Chris Wiggins
• APAM: Department of Applied Physics and Applied Mathematics;
• C2B2: Center for Computational Biology and Bioinformatics;
• CISB: Columbia University Initiative in Systems Biology
• ISDE: Institute for Data Sciences and Engineering
Columbia University
September 28, 2012
2. bio+stats vbem/networks hierarchical biological challenges inference model selection
thanks. . .
- jake hofman (vbmod,vbfret)
- jonathan bronson (vbfret)
- jan-willem van de meent (hfret)
- ruben gonzalez (vbfret, hfret)
for more info:
- vbfret.sourceforge.net
- vbmod.sourceforge.net
- hfret.sourceforge.net (soon)
chris.wiggins@columbia.edu 4/23/12
BMC bioinformatics, 2010;
PNAS 2009;
Biophysical Journal 2009;
3. bio+stats vbem/networks hierarchical
1 biology and statistics
genomics
generative modeling
2 variational/biological networks
variational Bayesian expectation maximization
inference
model selection
3 hierarchical/time series
biological challenges
inference
model selection
chris.wiggins@columbia.edu 4/23/12
8. introduction formulation results extensions motivation history
introduction:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Hartwell, Hopfield, Leibler and Murray
NATURE|VOL 402 | SUPP | 2
DECEMBER 1999 | www.nature.com
9. introduction formulation results extensions motivation history
motivation:
community detection in networks
social networks
biological networks
problem: over-fitting/resolution limit
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
10. introduction formulation results extensions motivation history
history:
by community
math/cs: spectral methods (Fiedler ’74, Shi + Malik ’00)
math/cs: clustering generally (Taskar, Koller, Getoor)
physics: modularity
common thread: test w/ stochastic block model (’76, ’83)
ergo: use as inference tool (Hastings 0604429,
Newman+Liecht 061148)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
11. introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
12. introduction formulation results extensions generative model max likelihood max evidence algo
generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ
13. introduction formulation results extensions generative model max likelihood max evidence algo
generative model. . . (bis)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Die rolling, coin flipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module
14. introduction formulation results extensions generative model max likelihood max evidence algo
max likelihood:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
15. introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
16. introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
17. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Increasing complexity
18. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
http://research.microsoft.com/~minka/statlearn/demo/
19. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
20. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
cf. “BIC” Schwartz, 1978
21. introduction formulation results extensions generative model max likelihood max evidence algo
generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ
22. introduction formulation results extensions generative model max likelihood max evidence algo
generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ c
n
23. introduction formulation results extensions generative model max likelihood max evidence algo
generative model. . . (bis)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Die rolling, coin flipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module
24. introduction formulation results extensions generative model max likelihood max evidence algo
generative model. . . (bis)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Die rolling, coin flipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module
25. introduction formulation results extensions generative model max likelihood max evidence algo
max likelihood:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
26. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
27. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2004 & 2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Can do integrals,
but sum is
intractable, O(KN);
use mean-field
28. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)
29. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
why would you do this? (A1):
Beal, 2003
30. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
why would you do this? (A2):
Beal, 2003
31. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
why would you do this? (A3):
Beal, 2003
32. introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)
• F is a functional of q; find approximation to posterior by optimizing approximation to
evidence
• Take q(z, π, θ)=q(z)q(π)q(θ); Qiμ is probability node i in module μ where expected counts
are:
33. introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
where expected counts
are:
34. introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
where expected counts
are:
35. introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
36. introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
suggests hard limit in step 3; sparse in step 1
37. introduction formulation results extensions run time consistency good vs easy real data
results:
run time
consistency
required plot: good vs. easy
real data
karate
biology
american football
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
38. introduction formulation results extensions run time consistency good vs easy real data
run time:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Main loop runtime for 104 nodes in MATLAB ~30 seconds
39. introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
40. introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
12
θ
N=8, K=2, distribution after 2 iterations
p(θ+
)
p(θ
−
)
41. introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• K=4?
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero
42. introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• K=4?
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero
44. introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
The “resolution limit” problem
10 12 14 16 18 20
8
10
12
14
16
18
20
Ktrue
K*
K
*
=Ktrue
Variational Bayes
Modularity optimization
10 12 14 16 18 20
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Ktrue
GNmodularity
Resolution limit problem on ring of 4−node cliques
Single−clique communities (correct)
Double−clique communities (incorrect)
GN modularity (Clauset’s algorithm)
Girvan-Newman modularity or Potts model w/ fixed parameters suffers from a resolution limit,
where size of detected modules depends on network size
Fortunato et. al. (2007), Kumpula et. al. (2007),
45. introduction formulation results extensions run time consistency good vs easy real data
good vs easy:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
47. introduction formulation results extensions run time consistency good vs easy real data
real data:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
APS march meeting 2008
superconductivity
(experimentalists)
Nanotubes, Graphene
superconductivity
(theorists)
48. introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
49. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
50. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Nodes belong to “blocks” of
varying size
• Roll die for assignment of
nodes to blocks
• Probability of edge between two
nodes depends only on block
membership
• Flip (one of K2) coins for edges
• Result: mixture of Erdos-Renyi
graphs
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2275
adjacency matrix
51. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
vs
52. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2803
adjacency matrix
53. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
vs
54. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
>> vbsbm_vs_vbmod(0)
running vbmod ...
Elapsed time is 1.136925 seconds.
running vbsbm ...
Elapsed time is 1.398904 seconds.
Fmod=13089.158019 Fsbm=13144.445782
vbmod wins
55. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
>> vbsbm_vs_vbmod(0.25)
running vbmod ...
Elapsed time is 1.557298 seconds.
running vbsbm ...
Elapsed time is 1.759527 seconds.
Fmod=20457.142416 Fsbm=19457.306022
vbsbm wins
56. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
>> vbsbm_vs_vbmod(0.5)
running vbmod ...
Elapsed time is 2.624886 seconds.
running vbsbm ...
Elapsed time is 1.440242 seconds.
Fmod=26133.351210 Fsbm=23921.797625
vbsbm wins
57. introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Using same framework we can compare the
unconstrained and full stochastic block models via p(D|M,K*)
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2100
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2048
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2108
adjacency matrix
58. introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
59. introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
60. introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
61. introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi
zj
Aij
π
θ c
n
62. introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi
zj
Aij
π
θ c
n
L
63. introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
64. introduction formulation results extensions
for more info. . .
code: MATLAB & python (inc. “full” SBM) (vbmod.sf.net)
paper: arxiv 08 / prl 08
Hofman soon to come (not by me)
code in C++, inc. full ‘vblabel propagation’ algo
twitter-scale analysis
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
66. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Jan-Willem van de Meent, Ruben Gonzalez, Chris Wiggins
Columbia University
72. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Tinoco and Gonzalez, Genes Dev, 2011
73. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
(short-lived GS1 states correspond to an EF-G + GDPNP binding event)
74. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
75. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
1. Identify states
2. Estimate Kinetic Rates
3. Average over many time series
4. Detect subpopulations
80. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
FRET SignalHistogram
Idea: Find probability of belonging to each state
82. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x θ) = log
z
p(x, z θ)
Expectation Maximization
1. calculate p(z | x, θi)
2. calculate θi+1 from p(z | x, θi)
83. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Learned Truth
Accurate for occupancy of states,
not so good for rate estimates
85. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
probability of state depends on previous state
p(zt+ =l zt =k) = Akl
93. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
2 States 3 States
94. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x θ) = log
z
p(x, z θ)
Log-Evidence
L = log p(x u) = log
z
∫ dθ p(x, z θ)p(θu)
Log-Evidence
95. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x θ) = log
z
p(x, z θ)
Log-Evidence
L = log p(x u) = log
z
∫ dθ p(x, z θ)p(θu)
Prior
96. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x θ) = log
z
p(x, z θ)
Log-Evidence
L = log p(x u) = log
z
∫ dθ p(x, z θ)p(θu)
Ensemble
97. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x θ) = log
z
p(x, z θ)
Log-Evidence
best model has highest average likelihood
L = log p(x u) = log
z
∫ dθ p(x, z θ)p(θu)
Log-Evidence
98. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Evidence
31
L = log p(x u) = log
z
∫ dθ p(x, z θ)p(θu)
Lower Bound
L =
z
∫ dθ q(z)q(θ w)log
p(x, z, θ u)
q(z)q(θ w)
≥ log p(x u)
q(z)q(θ w) p(z, θ x)
99. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
Lower bound tight for true posterior
L =
z
∫ dθ p(z, θ x)log
p(x, z, θ u)
p(z, θ x)
=
z
∫ dθ p(z, θ x)log[p(x u)]
= log p(x u)
L = log p(x u) − Dkl [q(z)q(θ w) p(z, θ x)]
117. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
118. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
ξntkl = p(znt = k, znt+ = l xn)
1. Run mixture model on posterior counts
p(ξnA) =
tkl
Aξntkl
kl
p(ξn um) = ∫ dA p(Aum)p(ξn A)
2. Rerun with M x K block-diagonal form
uA
=
uA
uA
uA
M
130. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Low Noise, UnderfittedInf Out - Inf In
131. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Low Noise, CorrectOut vs In
132. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Low Noise, OverfittedInf Out - Inf In
133. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, UnderfittedInf Out - Inf In
134. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, CorrectInf Out - Inf In
135. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, OverfittedInf Out - Inf In
136. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, OverfittedInf Out - Inf In
137. bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
the future, in progress:
X
138. bio+stats vbem/networks hierarchical biological challenges inference model selection
thanks. . .
- jake hofman (vbmod,vbfret)
- jonathan bronson (vbfret)
- jan-willem van de meent (hfret)
- ruben gonzalez (vbfret, hfret)
for more info:
- vbfret.sourceforge.net
- vbmod.sourceforge.net
- hfret.sourceforge.net (soon)
chris.wiggins@columbia.edu 4/23/12
BMC bioinformatics, 2010;
PNAS 2009;
Biophysical Journal 2009;
139. traditional role of statistics in biophysics
“if your experiment needs
statistics, you ought to
have done a better
experiment”
-lord rutherford