Matthew currently serves Digital Reasoning as its Chief Technology Officer at Digital Reasoning, a fast-growing cognitive computing company that specializes in structuring unstructured data and making human language computable. Matthew’s passion extends to several entrepreneurial ventures including angel investment and start-up advisement for novel applications of machine learning and data science. Matthew is based in Nashville, TN and spends most of his time outside of work training for CrossFit, riding motorcycles, and listening to Hamilton.
Matthew's thesis is that data science is the fundamental methodology for operationalizing machine learning models. In his talk, Matthew will define and underscore the importance of the scientific method as part of his methodology and share what he believes is the most important key indicator for cultivating high-performing data science teams that produce results that translate into business value.
NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...Rizwan Habib
Originally from the San Francisco bay area in California, American composer and multimedia artist Cole D. Ingraham holds a B.M. in Music Composition from the University of the Pacific, an M.F.A. in Electronic Music and Recording Media from Mills College, and a D.M.A. in Music Composition from the University of Colorado at Boulder. Currently Cole is living in Shanghai, China teaching music composition, theory, technology, and flute at FaceArt Institute of Music. He is an active performer, improviser, creative programmer, both as a soloist and a collaborator. His aesthetic involves experimentalism, noise, drone, programming as performance, and all things abstract.
Since 2008 Ingraham has performed around the world as part of the international network laptop quartet Glitch Lich. As a group they have developed a large amount of software to allow them to perform their unique brand of audio/visual art in real time, with our members distributed across the US, UK, and China. The quartet has been very active internationally with notable performances including the 2010 SuperCollider Symposiumin Berlin Germany, the 2011 and 2013 Network Music Festivals in Birmingham UK, the 2012International Computer Music Conference in Ljublana Slovenia, New Interfaces for Musical Expression in Seoul, Korea in 2013, and the first ever Algoraves in Tokyo, Japan in 2014.
Beyond the realm of acoustic instruments, he regularly writes software instruments and systems to realize his musical ideas. These range from stand-alone programs, to custom synthesized instruments and effects, to novel analogue and/or digital interfaces for controlling the software. Particularly of note is a custom app for the iPad called Un:Limit. This is a multitouch interface with a variable number of “virtual strings” and visual guides to aid in locating various tunings. The app also responds to the amount of surface area the performer’s finger is covering, adding an extra level of expression control. Un:Limit has been used in a number of performances of works written both specifically for it and those originally for other instruments.
Creative coding, programming as a means of creating art, is a central part of Ingraham’s work. This is not only true for his live coding performances, but for all aspects of his creative output. This stems from the idea that code is the most direct way to interact with the computer itself (Ingraham’s instrument of choice). This allows a great deal of flexibility and creativity not always present with pre-made software. Because of this, most of his creative work is created entirely from code, with minimal reliance on commercial software.
NYAI #5 - Fun With Neural Nets by Jason YosinskiRizwan Habib
Fun With Neural Nets - (Jason Yosinski, Researcher at Geometric Intelligence)
Jason Yosinski is a researcher at Geometric Intelligence, where he uses neural networks and machine learning to build better AI. He was previously a PhD student and NASA Space Technology Research Fellow working at the Cornell Creative Machines Lab, the University of Montreal, the Caltech Jet Propulsion Laboratory, and Google DeepMind. His work on AI has been featured on NPR, Fast Company, the Economist, TEDx, and on the BBC. When not doing research, Mr. Yosinski enjoys tricking middle school students into learning math while they play with robots.
Jason will talk about how deep neural networks have recently been making a bit of a splash, enabling machines to learn to solve problems that had previously been easy for humans but hard for machines, like playing Atari games or identifying lions or jaguars in photos. But how do these neural nets actually work? What do they learn? This turns out to be a surprisingly tricky question to answer — surprising because we built the darn things, but tricky because the networks are so large and have many millions of connections that effect complex computation which is hard to interpret. Trickiness notwithstanding, in this talk we’ll see what we can learn about neural nets by looking at a few examples of networks in action and experiments designed to elucidate network behavior.
Website: http://yosinski.com/
Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)Matthew Russell
Sunday 9:55 a.m.–10:45 a.m.
Why Twitter Is All the Rage: A Data Miner's Perspective
Presenter: Matthew Russell
Audience level: Novice
Description:
In order to be successful, technology must amplify a meaningful aspect of our human experience, and Twitter’s success largely has been dependent on its ability to do this quite well. Although you could describe Twitter as just a “free, high-speed, global text-messaging service,” that would be to miss the much larger point that Twitter scratches some of the most fundamental itches of our humanity.
Abstract:
This talk explains explains why Twitter is "all the rage" by examining Twitter in light of fundamental questions about our humanity:
* We want to be heard
* We want to satisfy our curiosity
* We want it easy
* We want it now
This session examines Twitter's ability to examine these questions and presents its underlying conceptual architecture as an interest graph.
Even if you have minimal programming skills, you'll come away empowered with the ability to think about data mining on Twitter in more effective ways and apply a powerful collection of easily adaptable recipes to fully exploit the 5 kilobytes of metadata that decorates those 140 characters that you commonly think of as a tweet. Learn how to access Twitter's API, search for tweets, discover trending topics, process tweets in real-time from the firehose, and much more.
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...Rizwan Habib
Sree Sreenivasan is the Cheif Digital Officer for the City of New York, where he works to promote access to City government through technology and support of the city's tech ecosystem.
Prior to his work at City Hall, Sreenivasan served for three years as the first Chief Digital Officer at the Metropolitan Museum of Art, where he led a 70-person team to increase the museum’s digital presence. In October 2015, he was appointed by Mayor de Blasio to the Commission on Public Information and Communication (COPIC), where he worked to increase access to, and education about, City information online.
Before his work at the Met, he spent 20 years as a member of faculty of the Columbia Journalism School and a year as Columbia University's first Chief Digital Officer. For four years, he taught an entrepreneurship class with Ken Lerer, co-founder and chairman of the Huffington Post and now head of Lerer Ventures. He has also taught media marketing classes at Columbia Business School. Because of his interest in startups, he has served as a mentor and adviser to dozens of new ventures, including many by his former students. At the same time, he teaches about intrapreneurship - how to foster innovation and create change within big organizations.
Prior to the Met, Sree was a founding member and contributing editor at neighborhood news site DNAinfo, and throughout his career, he has written for various publications, including the New York Times, and was a popular technology reporter on WABC-TV, WNBC-TV and WCBS-TV.
NYAI #9: Concepts and Questions As Programs by Brenden LakeRizwan Habib
Brenden studies computational problems that are easier for people than they are for machines. He received his Ph.D. in Cognitive Science from MIT in 2014, and his M.S. and B.S. in Symbolic Systems from Stanford University in 2009. He is a recipient of the Robert J. Glushko Prize for Outstanding Doctoral Dissertation in Cognitive Science. His recent research on Bayesian Program Learning has been covered by many media outlets (New York Times, Washington Post, etc.) and was selected by Scientific American as one of the most important advances of 2016.
Both cognitive science and AI can gain by studying the human solutions to difficult computational problems. Brenden's talk will focus concept learning and question asking, two problems that people solve far better than machines. People can learn a new concept from fewer examples, and then use their concepts in richer ways -- for imagination, extrapolation, and explanation, not just classification. Moreover, learning is often an active process; people can ask rich and probing questions in order to reduce uncertainty, while algorithms for active learning ask simple and stereotyped queries. He will also discuss work on program induction as a cognitive model and potential solution for extracting richer concepts from less data, with applications to learning handwritten characters and learning recursive visual concepts from examples. Brenden will end with program synthesis as a model of question asking in simple games.
NYAI - Understanding Music Through Machine Learning by Brian McFeeRizwan Habib
Understanding Music Through Machine Learning - (Brian McFee, Moore-Sloan Fellow at New York University's Center for Data Science)
Brian McFee is a Moore-Sloan Fellow at New York University's Center for Data Science. He received a B.S. degree in Computer Science from the University of California, Santa Cruz in 2003, and Ph.D. in Computer Science and Engineering from the University of California, San Diego in 2012. His work touches on various topics at the intersection of machine learning, information retrieval, and audio analysis. He is a contributor to various open source projects, and a principal developer of the librosa package for music analysis in Python.
Brian will talk about how we can understand music through machine learning. Music signals share much in common with other high-dimensional data domains (e.g., images or speech), but the domain also imposes unique constraints which can inform the problem formulations and be exploited during the modeling process. In this talk, he will give two examples of recent work on statistical musical analysis: structure analysis and instrument recognition. He will provide pointers to data sets and open source implementations wherever possible.
NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...Rizwan Habib
Originally from the San Francisco bay area in California, American composer and multimedia artist Cole D. Ingraham holds a B.M. in Music Composition from the University of the Pacific, an M.F.A. in Electronic Music and Recording Media from Mills College, and a D.M.A. in Music Composition from the University of Colorado at Boulder. Currently Cole is living in Shanghai, China teaching music composition, theory, technology, and flute at FaceArt Institute of Music. He is an active performer, improviser, creative programmer, both as a soloist and a collaborator. His aesthetic involves experimentalism, noise, drone, programming as performance, and all things abstract.
Since 2008 Ingraham has performed around the world as part of the international network laptop quartet Glitch Lich. As a group they have developed a large amount of software to allow them to perform their unique brand of audio/visual art in real time, with our members distributed across the US, UK, and China. The quartet has been very active internationally with notable performances including the 2010 SuperCollider Symposiumin Berlin Germany, the 2011 and 2013 Network Music Festivals in Birmingham UK, the 2012International Computer Music Conference in Ljublana Slovenia, New Interfaces for Musical Expression in Seoul, Korea in 2013, and the first ever Algoraves in Tokyo, Japan in 2014.
Beyond the realm of acoustic instruments, he regularly writes software instruments and systems to realize his musical ideas. These range from stand-alone programs, to custom synthesized instruments and effects, to novel analogue and/or digital interfaces for controlling the software. Particularly of note is a custom app for the iPad called Un:Limit. This is a multitouch interface with a variable number of “virtual strings” and visual guides to aid in locating various tunings. The app also responds to the amount of surface area the performer’s finger is covering, adding an extra level of expression control. Un:Limit has been used in a number of performances of works written both specifically for it and those originally for other instruments.
Creative coding, programming as a means of creating art, is a central part of Ingraham’s work. This is not only true for his live coding performances, but for all aspects of his creative output. This stems from the idea that code is the most direct way to interact with the computer itself (Ingraham’s instrument of choice). This allows a great deal of flexibility and creativity not always present with pre-made software. Because of this, most of his creative work is created entirely from code, with minimal reliance on commercial software.
NYAI #5 - Fun With Neural Nets by Jason YosinskiRizwan Habib
Fun With Neural Nets - (Jason Yosinski, Researcher at Geometric Intelligence)
Jason Yosinski is a researcher at Geometric Intelligence, where he uses neural networks and machine learning to build better AI. He was previously a PhD student and NASA Space Technology Research Fellow working at the Cornell Creative Machines Lab, the University of Montreal, the Caltech Jet Propulsion Laboratory, and Google DeepMind. His work on AI has been featured on NPR, Fast Company, the Economist, TEDx, and on the BBC. When not doing research, Mr. Yosinski enjoys tricking middle school students into learning math while they play with robots.
Jason will talk about how deep neural networks have recently been making a bit of a splash, enabling machines to learn to solve problems that had previously been easy for humans but hard for machines, like playing Atari games or identifying lions or jaguars in photos. But how do these neural nets actually work? What do they learn? This turns out to be a surprisingly tricky question to answer — surprising because we built the darn things, but tricky because the networks are so large and have many millions of connections that effect complex computation which is hard to interpret. Trickiness notwithstanding, in this talk we’ll see what we can learn about neural nets by looking at a few examples of networks in action and experiments designed to elucidate network behavior.
Website: http://yosinski.com/
Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)Matthew Russell
Sunday 9:55 a.m.–10:45 a.m.
Why Twitter Is All the Rage: A Data Miner's Perspective
Presenter: Matthew Russell
Audience level: Novice
Description:
In order to be successful, technology must amplify a meaningful aspect of our human experience, and Twitter’s success largely has been dependent on its ability to do this quite well. Although you could describe Twitter as just a “free, high-speed, global text-messaging service,” that would be to miss the much larger point that Twitter scratches some of the most fundamental itches of our humanity.
Abstract:
This talk explains explains why Twitter is "all the rage" by examining Twitter in light of fundamental questions about our humanity:
* We want to be heard
* We want to satisfy our curiosity
* We want it easy
* We want it now
This session examines Twitter's ability to examine these questions and presents its underlying conceptual architecture as an interest graph.
Even if you have minimal programming skills, you'll come away empowered with the ability to think about data mining on Twitter in more effective ways and apply a powerful collection of easily adaptable recipes to fully exploit the 5 kilobytes of metadata that decorates those 140 characters that you commonly think of as a tweet. Learn how to access Twitter's API, search for tweets, discover trending topics, process tweets in real-time from the firehose, and much more.
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...Rizwan Habib
Sree Sreenivasan is the Cheif Digital Officer for the City of New York, where he works to promote access to City government through technology and support of the city's tech ecosystem.
Prior to his work at City Hall, Sreenivasan served for three years as the first Chief Digital Officer at the Metropolitan Museum of Art, where he led a 70-person team to increase the museum’s digital presence. In October 2015, he was appointed by Mayor de Blasio to the Commission on Public Information and Communication (COPIC), where he worked to increase access to, and education about, City information online.
Before his work at the Met, he spent 20 years as a member of faculty of the Columbia Journalism School and a year as Columbia University's first Chief Digital Officer. For four years, he taught an entrepreneurship class with Ken Lerer, co-founder and chairman of the Huffington Post and now head of Lerer Ventures. He has also taught media marketing classes at Columbia Business School. Because of his interest in startups, he has served as a mentor and adviser to dozens of new ventures, including many by his former students. At the same time, he teaches about intrapreneurship - how to foster innovation and create change within big organizations.
Prior to the Met, Sree was a founding member and contributing editor at neighborhood news site DNAinfo, and throughout his career, he has written for various publications, including the New York Times, and was a popular technology reporter on WABC-TV, WNBC-TV and WCBS-TV.
NYAI #9: Concepts and Questions As Programs by Brenden LakeRizwan Habib
Brenden studies computational problems that are easier for people than they are for machines. He received his Ph.D. in Cognitive Science from MIT in 2014, and his M.S. and B.S. in Symbolic Systems from Stanford University in 2009. He is a recipient of the Robert J. Glushko Prize for Outstanding Doctoral Dissertation in Cognitive Science. His recent research on Bayesian Program Learning has been covered by many media outlets (New York Times, Washington Post, etc.) and was selected by Scientific American as one of the most important advances of 2016.
Both cognitive science and AI can gain by studying the human solutions to difficult computational problems. Brenden's talk will focus concept learning and question asking, two problems that people solve far better than machines. People can learn a new concept from fewer examples, and then use their concepts in richer ways -- for imagination, extrapolation, and explanation, not just classification. Moreover, learning is often an active process; people can ask rich and probing questions in order to reduce uncertainty, while algorithms for active learning ask simple and stereotyped queries. He will also discuss work on program induction as a cognitive model and potential solution for extracting richer concepts from less data, with applications to learning handwritten characters and learning recursive visual concepts from examples. Brenden will end with program synthesis as a model of question asking in simple games.
NYAI - Understanding Music Through Machine Learning by Brian McFeeRizwan Habib
Understanding Music Through Machine Learning - (Brian McFee, Moore-Sloan Fellow at New York University's Center for Data Science)
Brian McFee is a Moore-Sloan Fellow at New York University's Center for Data Science. He received a B.S. degree in Computer Science from the University of California, Santa Cruz in 2003, and Ph.D. in Computer Science and Engineering from the University of California, San Diego in 2012. His work touches on various topics at the intersection of machine learning, information retrieval, and audio analysis. He is a contributor to various open source projects, and a principal developer of the librosa package for music analysis in Python.
Brian will talk about how we can understand music through machine learning. Music signals share much in common with other high-dimensional data domains (e.g., images or speech), but the domain also imposes unique constraints which can inform the problem formulations and be exploited during the modeling process. In this talk, he will give two examples of recent work on statistical musical analysis: structure analysis and instrument recognition. He will provide pointers to data sets and open source implementations wherever possible.
NYAI - Commodity Machine Learning & Beyond by Andreas MuellerRizwan Habib
Commodity Machine Learning - (Andreas Mueller)
Recent years have seen a widespread adoption of machine learning in industry and academia, impacting diverse areas from advertisement to personal medicine. As more and more areas adopt machine learning and data science techniques, the question arises on how much expertise is needed to successfully apply machine learning, data science and statistics. Not every company can afford a data science team, and getting your PhD in biology, no-one can expect you to have PhD-level expertise in computer science and statistics.
This talk will summarize recent progress in automating machine learning and give an overview of the tools currently available. It will also point out areas where the ecosystem needs to improve in order to allow a wider access to inference using data science techniques. Finally we will point out some open problems regarding assumptions, and limitations of what can be automated.
Andreas is an Research Engineer at the NYU Center for Data Science, building open source software for data science. Previously, he worked as a Machine Learning Scientist at Amazon, developing solutions for computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and has co-maintained it for several years.
His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.
scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.
scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models.
The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline.
We will also cover how to build machine learning models on text data, and how to handle very large datasets.
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
A presentation to the Owen Graduate School of Management (Vanderbilt University) about social media and some of the technology behind the future uses of social media that are likely to shape the future of the Web as we know it.
Lessons Learned from Running Hundreds of Kaggle CompetitionsBen Hamner
At Kaggle, we’ve run hundreds of machine learning competitions and seen over 80,000 data scientists make submissions. One thing is clear: winning competitions isn’t random. We’ve learned that certain tools and methodologies work consistently well on different types of problems. Many participants make common mistakes (such as overfitting) that should be actively avoided. Similarly, competition hosts have their own set of pitfalls (such as data leakage).
In this talk, I’ll share what goes into a winning competition toolkit along with some war stories on what to avoid. Additionally, I’ll share what we’re seeing on the collaborative side of competitions. Our community is showing an increasing amount of collaboration in developing machine learning models and analytic solutions. I’ll showcase examples of this and discuss how these types of collaboration will improve how data science is learned and applied.
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
From the tutorial description at https://us.pycon.org/2014/schedule/presentation/134/ -
Description
Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.
Abstract
This workshop teaches you fundamental data mining techniques as applied to popular social websites by adapting example code from Mining the Social Web (2nd Edition, O'Reilly 2013) in a tutorial-style step-by-step manner that is designed specifically to accommodate attendees with very little programming or domain experience. This workshop's extensive use of IPython Notebook facilitates interactive learning with turn-key examples against a Vagrant-based virtual machine that takes care of installing all 3rd party dependencies that are needed. The barriers to entry are truly minimal, which allows maximal use of the time to be spent on interactive learning.
The workshop is somewhat broadly designed and acclimates you to mining social data from Twitter, Facebook, LinkedIn, Google+, and GitHub APIs in five corresponding modules with the following memorable approach for each of them:
* Aspire - Set out to answer a question or test a hypothesis as part of a data science experiment
* Acquire - Collect and store the data that you need to answer the question or test the hypothesis
* Analyze - Use fundamental data mining techniques to explore and exploit the data
* Summarize - Present analytical findings in a compact and meaningful way
Each module consists of a brief period in which each attendee will customize the corresponding notebook for the module with their own account credentials with the remainder of the module devoted to learning what data is available from the API and exercises demonstrating analysis of the data—all from a pre-populated IPython Notebook. Time will be set aside at the end of each module for attendees to hack on the code, discuss examples, and ask any lingering questions.
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
Scaling Machine Learning Systems - (Braxton McKee, CEO & Founder, Ufora)
Braxton is the technical lead and founder of Ufora, a software company that develops Pyfora, an automatically parallel implementation of the Python programming language that enables data science and machine-learning at scale. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.
Braxton will discuss scaling machine learning applications using the open-source platform Pyfora. He will describe both the general approach and also some specific engineering techniques employed in the implementation of Pyfora that make it possible to produce large-scale machine learning and data science programs directly from single-threaded Python code.
NYAI - Interactive Machine Learning by Daniel HsuRizwan Habib
Interactive learning - (Daniel Hsu)
Daniel Hsu is an assistant professor in the Department of Computer Science and a member of the Data Science Institute, both at Columbia University. Previously, he was a postdoc at Microsoft Research New England, and the Departments of Statistics at Rutgers University and the University of Pennsylvania. He holds a Ph.D. in Computer Science from UC San Diego, and a B.S. in Computer Science and Engineering from UC Berkeley. He received a 2014 Yahoo ACE Award, was selected by IEEE Intelligent Systems as one of "AI's 10 to Watch" in 2015, and received a 2016 Sloan Research Fellowship.
Daniel's research interests are in algorithmic statistics, machine learning, and privacy. His work has produced the first computationally efficient algorithms for numerous statistical estimation tasks (including many involving latent variable models such as mixture models, hidden Markov models, and topic models), provided new algorithmic frameworks for tackling interactive machine learning problems, and led to the creation of highly-scalable tools for machine learning applications.
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...Rizwan Habib
A Path To Unsupervised Learning Through Adversarial Networks - (Soumith Chintala, Researcher at Facebook AI Research)
Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. He holds a Masters in CS from NYU, and spent time in Yann LeCun's NYU lab building deep learning models for pedestrian detection, natural image OCR, depth-images among others.
Soumith will go over generative adversarial networks, a particular way of training neural networks to build high quality generative models. The talk will take you through an easy to follow timeline of the research and improvements in adversarial networks, followed by some future directions, as well as applications.
Using deep neural networks for fashion applicationsAhmad Qamar
Talk abstract:
Deep learning has been a popular and powerful approach for solving computer vision problems in recent years. As web and social media content shifts towards rich-media, deep learning can be used to tackle the problem of understanding images to better capture user's fashion preferences. In this talk we take a closer look at convolutional neural networks used for detecting, tagging, and indexing fashion images. We'll also cover related work in the area, illustrate a wide range of applications, discuss challenges and merits of domain-specific deep learning models, and touch upon future work.
Thread Genius is a NYC-based Techstars-backed visual search and recommendation platform for fashion content. Use the full suite of Thread Genius APIs to index and identify clothing within UGC photos, find visually similar alternatives, or recommendations on how to complete the look. Find out more at threadgenius.co
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
NYAI - Commodity Machine Learning & Beyond by Andreas MuellerRizwan Habib
Commodity Machine Learning - (Andreas Mueller)
Recent years have seen a widespread adoption of machine learning in industry and academia, impacting diverse areas from advertisement to personal medicine. As more and more areas adopt machine learning and data science techniques, the question arises on how much expertise is needed to successfully apply machine learning, data science and statistics. Not every company can afford a data science team, and getting your PhD in biology, no-one can expect you to have PhD-level expertise in computer science and statistics.
This talk will summarize recent progress in automating machine learning and give an overview of the tools currently available. It will also point out areas where the ecosystem needs to improve in order to allow a wider access to inference using data science techniques. Finally we will point out some open problems regarding assumptions, and limitations of what can be automated.
Andreas is an Research Engineer at the NYU Center for Data Science, building open source software for data science. Previously, he worked as a Machine Learning Scientist at Amazon, developing solutions for computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and has co-maintained it for several years.
His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.
scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.
scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models.
The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline.
We will also cover how to build machine learning models on text data, and how to handle very large datasets.
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
A presentation to the Owen Graduate School of Management (Vanderbilt University) about social media and some of the technology behind the future uses of social media that are likely to shape the future of the Web as we know it.
Lessons Learned from Running Hundreds of Kaggle CompetitionsBen Hamner
At Kaggle, we’ve run hundreds of machine learning competitions and seen over 80,000 data scientists make submissions. One thing is clear: winning competitions isn’t random. We’ve learned that certain tools and methodologies work consistently well on different types of problems. Many participants make common mistakes (such as overfitting) that should be actively avoided. Similarly, competition hosts have their own set of pitfalls (such as data leakage).
In this talk, I’ll share what goes into a winning competition toolkit along with some war stories on what to avoid. Additionally, I’ll share what we’re seeing on the collaborative side of competitions. Our community is showing an increasing amount of collaboration in developing machine learning models and analytic solutions. I’ll showcase examples of this and discuss how these types of collaboration will improve how data science is learned and applied.
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
From the tutorial description at https://us.pycon.org/2014/schedule/presentation/134/ -
Description
Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.
Abstract
This workshop teaches you fundamental data mining techniques as applied to popular social websites by adapting example code from Mining the Social Web (2nd Edition, O'Reilly 2013) in a tutorial-style step-by-step manner that is designed specifically to accommodate attendees with very little programming or domain experience. This workshop's extensive use of IPython Notebook facilitates interactive learning with turn-key examples against a Vagrant-based virtual machine that takes care of installing all 3rd party dependencies that are needed. The barriers to entry are truly minimal, which allows maximal use of the time to be spent on interactive learning.
The workshop is somewhat broadly designed and acclimates you to mining social data from Twitter, Facebook, LinkedIn, Google+, and GitHub APIs in five corresponding modules with the following memorable approach for each of them:
* Aspire - Set out to answer a question or test a hypothesis as part of a data science experiment
* Acquire - Collect and store the data that you need to answer the question or test the hypothesis
* Analyze - Use fundamental data mining techniques to explore and exploit the data
* Summarize - Present analytical findings in a compact and meaningful way
Each module consists of a brief period in which each attendee will customize the corresponding notebook for the module with their own account credentials with the remainder of the module devoted to learning what data is available from the API and exercises demonstrating analysis of the data—all from a pre-populated IPython Notebook. Time will be set aside at the end of each module for attendees to hack on the code, discuss examples, and ask any lingering questions.
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
Scaling Machine Learning Systems - (Braxton McKee, CEO & Founder, Ufora)
Braxton is the technical lead and founder of Ufora, a software company that develops Pyfora, an automatically parallel implementation of the Python programming language that enables data science and machine-learning at scale. Before founding Ufora with backing from Two Sigma Ventures and others, Braxton led the ten-person MBS/ABS Credit Modeling team at Ellington Management Group, a multi-billion dollar mortgage hedge fund. He holds a BS (Mathematics), MS (Mathematics), and M.B.A. from Yale University.
Braxton will discuss scaling machine learning applications using the open-source platform Pyfora. He will describe both the general approach and also some specific engineering techniques employed in the implementation of Pyfora that make it possible to produce large-scale machine learning and data science programs directly from single-threaded Python code.
NYAI - Interactive Machine Learning by Daniel HsuRizwan Habib
Interactive learning - (Daniel Hsu)
Daniel Hsu is an assistant professor in the Department of Computer Science and a member of the Data Science Institute, both at Columbia University. Previously, he was a postdoc at Microsoft Research New England, and the Departments of Statistics at Rutgers University and the University of Pennsylvania. He holds a Ph.D. in Computer Science from UC San Diego, and a B.S. in Computer Science and Engineering from UC Berkeley. He received a 2014 Yahoo ACE Award, was selected by IEEE Intelligent Systems as one of "AI's 10 to Watch" in 2015, and received a 2016 Sloan Research Fellowship.
Daniel's research interests are in algorithmic statistics, machine learning, and privacy. His work has produced the first computationally efficient algorithms for numerous statistical estimation tasks (including many involving latent variable models such as mixture models, hidden Markov models, and topic models), provided new algorithmic frameworks for tackling interactive machine learning problems, and led to the creation of highly-scalable tools for machine learning applications.
NYAI - A Path To Unsupervised Learning Through Adversarial Networks by Soumit...Rizwan Habib
A Path To Unsupervised Learning Through Adversarial Networks - (Soumith Chintala, Researcher at Facebook AI Research)
Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. He holds a Masters in CS from NYU, and spent time in Yann LeCun's NYU lab building deep learning models for pedestrian detection, natural image OCR, depth-images among others.
Soumith will go over generative adversarial networks, a particular way of training neural networks to build high quality generative models. The talk will take you through an easy to follow timeline of the research and improvements in adversarial networks, followed by some future directions, as well as applications.
Using deep neural networks for fashion applicationsAhmad Qamar
Talk abstract:
Deep learning has been a popular and powerful approach for solving computer vision problems in recent years. As web and social media content shifts towards rich-media, deep learning can be used to tackle the problem of understanding images to better capture user's fashion preferences. In this talk we take a closer look at convolutional neural networks used for detecting, tagging, and indexing fashion images. We'll also cover related work in the area, illustrate a wide range of applications, discuss challenges and merits of domain-specific deep learning models, and touch upon future work.
Thread Genius is a NYC-based Techstars-backed visual search and recommendation platform for fashion content. Use the full suite of Thread Genius APIs to index and identify clothing within UGC photos, find visually similar alternatives, or recommendations on how to complete the look. Find out more at threadgenius.co
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. WHAT WE DO
Cognitive Computing platform
that understands human
communication
OFFICE LOCATIONS:
Nashville
Washington
New York
London
INVESTORS:
Goldman Sachs, Credit
Suisse, Nasdaq, In-Q-Tel, HCA
& Lemhi Ventures
RESULTS PROVEN IN:
Government
Financial Services
Health Care
Data Science
STRATEGIC PARTNERS
DIGITAL REASONING
2
4. AGENDA
• The best way to operationalize machine learning is with data
science
• Data science teams that can accomplish more experiments in less
time will outperform those that don’t
3
6. HUMAN LANGUAGE IS HIGHLY PLASTIC
5
Would you rather try to build something awesome by sculpting plastic or by
composing Legos?
7. BETTER ABSTRACTIONS YIELD BETTER OUTCOMES
6
Practitioners of equal ability will be able
to build far more useful things with Legos
than by sculpting plastic with artisan
tools.
11. 10Metadata Tokens Phrases Entities Concepts
Temporal
Reasoning Assertions Relationships Concept Resolution
NN NNS MD VB DT JJ NN “ IN VBG TO JJ JJ
JJ NNS IN NNS IN VB DT NN JJ JJ
NNP NNP NNP VBD NNP RB •
NNP
“
NNP Pos NNS IN JJ CC JJ NNS VB
NN IN DT JJ NN CC MD VB “ VBD RB “ P3S VBD • DT
JJ JJ JJ NN “ VBD IN DT NN IN PRP$ NN “
VBZ IN “ DT JJ JJ NN MD VB “ P3S VBD •
NNP VBD VBNEX NN IN IN NNP NNP VBZDTVBZ JJ
NNP NNP DTCC JJ NNS NNS VBNNIN IN “ To
VB NNS RB “ RB VBG “ NNS CC NNS “ To DT NNP
NN • IN DT NN NNP NNP VBZ To VB PRP$ JJ
NNS NNS P3S VBD •
“ P3S VBZNN To VB DT NN To DT JJ NN “ P3S VBD VBG DT JJ
JJ •
NNP Pos NN To DT JJ NN IN NNP RB VBD NNS
NNS IN NNP NNP IN IN DT CD NNS VBN To VB
NN IN DT NNP NNPS CC NNP NNP IN NN To JJ
NNP NNS • NNP VBD DT CD JJ NNS VB “ RB JJ RB RB “•
NNP CD NNS IN PRP$ NN VBZ VBG VBG JJ NN RB To
VB DT JJ NN IN DT NNP JJ JJ NN • DT CD
NNS VB VBG To VB NN IN NN CC NN RB RB RB
NN •
VBG IN DT NNP NNP NNP VBD DT NNP NNP IN PRP$
NN IN DT NNP NNP VBG RP IN NN CD NNS IN
VB NNS IN DT NN CC P3S VBD DT NN NNP NNP
VBZ IN DT NN IN DT NN IN NN NNS •
NNP NNP VBZ VBG DT NN“ JJS IN NN “ CC VBG
To NNP NNP Pos NNS “ RB CC RB “ P3S VBD VBG IN DT
JJ JJ JJ NNS VB VBN DT JJ NN IN
NNP IN NN •
NNP VBD P3S MD VB IN VBG NN IN NNP NNP IN “ P1S VBP
JJ IN NN VBZ DT NN To NN “ CC JJ NN • RB
P3S VBD NNP NNP VBZ To VB RP PRP$ JJ JJ NN CC
VB DT JJ“ NN “ IN DT JJ NN •
DT“ NNP IN NNP MD RB VB DT JJ Sym JJ NNP NNP “ P3S
VBD • “ NNP Pos NNS MD VB VBN RB • “
NNPSym
Sym
Metadata Tokens
15. 14
Concept Mention Predicate Related Entity Fact Category
Sentimen
t
Sentence
World powers end North Korea Action Negative
World powers must end the “vicious circle” of responding to periodic North Korean provocations
with actions that reward such behavior, South Korean President Park Geun-hye told Congress
yesterday.
Park Geun-hye
South Korean
President Park
Geun-hye
tell Congress Statement Negative
World powers must end the “vicious circle” of responding to periodic North Korean provocations
with actions that reward such behavior, South Korean President Park Geun-hye told Congress
yesterday.
North Korea
North Korea’s
threats
undermine Korean Peninsula Conflict Negative
North Korea’s threats, including nuclear and missile tests, undermine security on the Korean
peninsula and will be “met decisively,” she said.
Park Geun-hye she say North Korea Statement Negative
North Korea’s threats, including nuclear and missile tests, undermine security on the Korean
peninsula and will be “met decisively,” she said.
South Korean
Government
strong South
Korean
government
ensure North Korea Communication Positive
A strong South Korean government “backed by the might of our alliance” ensures that “no North
Korean provocation can succeed,” she said.
Park Geun-hye she say
South Korean
Government
Statement Positive
A strong South Korean government “backed by the might of our alliance” ensures that “no North
Korean provocation can succeed,” she said.
North Korea North Korea threaten South Korea Conflict Negative
Park said there has been a historical pattern in which North Korea threatens South Korea and, after
a period of international sanctions, nations try “to patch things up” by offering “concessions and
rewards” to the Pyongyang government.
Park Geun-hye Park say North Korea Statement Negative
Park said there has been a historical pattern in which North Korea threatens South Korea and, after
a period of international sanctions, nations try “to patch things up” by offering “concessions and
rewards” to the Pyongyang government.
nations patch up
Pyongyang
government
Communication Negative
Park said there has been a historical pattern in which North Korea threatens South Korea and, after
a period of international sanctions, nations try “to patch things up” by offering “concessions and
rewards” to the Pyongyang government.
North Korea North Korea advance
its nuclear weapons
capabilities
Motion Negative In the meantime, North Korea continues to advance its nuclear weapons capabilities, she said.
Park Geun-hye she say North Korea Statement Negative In the meantime, North Korea continues to advance its nuclear weapons capabilities, she said.
Park Geun-hye she say vicious circle Statement Negative “It’s time to put an end to this vicious circle,” she said, drawing a standing ovation.
Park Geun-hye she draw standing ovation Action Positive “It’s time to put an end to this vicious circle,” she said, drawing a standing ovation.
Park Geun-hye Park’s address follow President Obama Communication Neutral
Park’s address to a joint meeting of Congress yesterday followed talks Tuesday with President
Obama…
the two leaders display unity Relationship Neutral
...at which the two leaders sought to display unity between the United States and South Korea in
response to North Korean threats.
two longtime allies be united Relationship Positive Obama said the two longtime allies are “as united as ever.”
President Barack
Obama
Obama say two longtime allies Statement Positive Obama said the two longtime allies are “as united as ever.”
Park Geun-hye Park make first trip abroad Travel Neutral
Park, three months into her presidency, is making her first trip abroad to mark the 60th anniversary
of the U.S.-South Korean alliance.
Park Geun-hye Park mark 60th anniversary Relationship Neutral
Park, three months into her presidency, is making her first trip abroad to mark the 60th anniversary
of the U.S.-South Korean alliance.
Two nations expand cooperation Relationship Positive The two nations are seeking to expand cooperation on trade and energy as well as security.
Park Geun-hye Park thank United States Communication Neutral
Park thanked the United States for its support in the Korean War, singling out for recognition four
lawmakerswho are veterans of that conflict…
Park Geun-hye Park stress importance Communication Neutral
…and she stressed the importance South Korea places on the alliance in the face of security
challenges.
South Korea South Korea maintain readiness Status Positive
South Korea is maintaining the “highest level of readiness” and responding to North Korea’s actions
“resolutely but calmly,” she said…
South Korea is maintaining the “highest level of readiness” and responding to North Korea’s actions
Metadata Tokens Phrases Entities Concepts
Temporal
Reasoning Assertions Relationships Concept ResolutionMetadata Tokens Phrases Entities Concepts
Temporal
Reasoning Assertions
NYAI
16. KNOWLEDGE GRAPHS: THE NEXT WAVE OF INNOVATION
• Document analysis is becoming commoditized
• The synthesis of knowledge graphs from a corpus is the next frontier
• Knowledge graphs will accelerate conversational interfaces/agents
• Conversational interfaces are a key enabler of the Internet of Things
15
20. 19
Metadata Tokens Phrases Entities Concepts
Temporal
Reasoning Assertions Relationships Concept Resolution
CanonHong Kong
Park Geun-hye
KNOWLEDGE GRAPHS: ENTITIES IN RELATIONSHIP, TIME,
& SPACE
21. THESIS
• The best way to operationalize machine learning is with data
science
• Practicing data science requires careful application of the scientific method
with repeatable and well-defined experiments
20
24. MOST IMPORTANT KPI FOR DATA SCIENCE
• Optimizing for power output is the most important KPI for data
science practitioners
• Work ~ Experiment
• Power ~ Experiments per unit time
1 1 / 2 5 / 2 0 1 6 23
25. OPTIMIZE FOR POWER OUTPUT
• Optimize for power output by doing more experiments in less time
• Doing it with…
• Better tools*
• Better experiments*
• Better know-how
• Better teamwork
1 1 / 2 5 / 2 0 1 6 24
26. BEST PRACTICES FOR EXPERIMENTS
• An experiment should yield an artifact that tests a hypothesis
• Repeatable experiments yield momentum
• Repeatability => Collaboration => Innovation => Momentum
• Progress should be measured with scorecards
• Think:
• Chemistry lab
• Test-driven development
1 1 / 2 5 / 2 0 1 6 25
27. AN EXPERIMENT IS THE FUNDAMENTAL UNIT OF WORK
• An Experiment is a tuple:
• Versioned Training Data
• Versioned Evaluation Data
• Versioned Source Code
• Versioned Hyperparameters
• Versioned Tests
1 1 / 2 5 / 2 0 1 6 26
29. EXPERIMENTAL ILLUSTRATION
• Define a hypothesis with a quantifiable outcome that can be tested:
• I can teach a machine to diagnose cancer from medical reports with precision
of 95% and recall of 85%.
• Build a model that yields an “IF CANCER” document label
• Yielding a “WHICH CANCER” document label naturally follows
• Test the outcome:
• Build a predictive model that “reads” the pathology reports and predicts
cancer with a quantifiable confidence level
• Wash, Rinse, Repeat…
28
30. EXPERIMENTAL ILLUSTRATION
CD NN ABV ABR CD
ABV VB DT NN IN JJ NN ABV JJ
NN CTSymCTSymCT CTSymCT JJ
NN IN NN JJ JJSymNN
ABV NN IN DT NN VBD VBN RB DT
JJ NN IN CT ABV ABV Sym CT •
NN NN CC
CT Sym
JJ JJ NN VBD VBN •
NN
JJ NN Sym EX VB Neg ABV NN IN JJ
NN •
DT
JJ JJ NN VBZ JJ NNS JJ IN NN •
JJ NN CC NN Sym DT NNS VB RB
JJ •
JJ JJ JJ NN NN •
JJ NN JJ JJ NN IN
NN CD IN NN CD VBG CD ABV• JJ JJ JJ NN
NN ••
JJ CD JJ NNS Sym Neg JJ CC JJ
NN •
Neg JJ JJ CC JJ NN •
NN CC NN Sym
EX VB Neg JJ NNS IN CC
NN CC
NN IN VB RB RB CDABV IN JJ NN NN •
NN SymNeg NN IN JJ NN CC JJ
NN •
DT JJ NN VBZ IN JJ •
JJ NN Sym JJ NN IN DT JJ NN VBZ JJ •
JJ Sym
Neg NN IN JJ NN •
JJ JJ JJ NN NN •
JJ JJ JJ NN JJ NN •
VB NNS RB •
NNP NNP NNS IN NN NN IN JJ
NNS Sym
JJ NN NN Sym
SymCDSymCDABVSym JJ NN ABV CD NN •
CC JJ Neg RB NN NN
JJ NN NN Sym
SymCDSymCDABVSym
JJ NN ABVCDSym CDNN CC CDSymCD NN
JJ NN NNS VB NNS IN DT JJ CC JJ NN IN
NN CC JJ JJ NN NNS IN NN NN •
JJ NN NNS VB NNS IN DT NN IN NN CC JJ
JJ NN NNS IN NN NN IN JJ NN NN
IN NN NN NN IN NNS NN CC NN •
33. Medical Entity Flag
lung nodule Yes
bronchial wall Yes
pulmonary embolism No
lobe infiltrate No
pleural effusion No
pericardial effusion No
pleural mass No
pericardial mass No
mediastinum No
hilum No
aortic aneurysm No
heart No
abdomen No
lungs No
lymph nodes No
EXPERIMENTAL ILLUSTRATION
34. SUMMARY
• The best way to operationalize machine learning is with data
science
• Data science necessarily involves highly repeatable experiments
that are contextualized within the scientific method
• The most important KPI for data science teams is number of
experiments per unit time
• Data science teams that thoughtfully consider this KPI while
accomplishing more experiments in less time will outperform those
that don’t
33
35. 34
I HAVE THE HONOR TO BE,
YOUR OBEDIENT SERVANT…
M.R.
• Matthew A. Russell
• @ptwobrussell
• LinkedIn
• Gmail
• Twitter
• Digital Reasoning
• http://digitalreasoning.com
• @dreasoning
37. WHAT OUR CUSTOMERS & PARTNERS SAYING …
36
“Using Synthesys gives our team the
means to discover potential
problems and act on them before
they ripen into actual problems”
Vinny Tortorella, Chief Compliance &
Surveillance Officer
“Digital Reasoning provides the
proactive identification of potential
risks across our business and
continuous of learning of resulting
reviews”
Will Davis, Global Head of Compliance
& Operational Risk Control Technology
“Congratulations to Digital Reasoning
on being recognized as a leader in Big
Data Text Analytics. We are exited to be
working with Digital Reasoning and its
award winning technology”
Valarie Bannert-Thurner, Global Head,
Risk & Surveillance Solutions
38. WHAT OTHERS ARE SAYING …
37
“Banks now want to go one step
further, and are looking at acquiring
technology that can spot and prevent
inappropriate communication or
fraudulent activity… There is a huge
market for this right now," said Sang
Lee, founding partner at Aite Group”
“Digital Reasoning applies AI to
understand human communication to
ferret out suspicious
activity. Over time, this class of service
may become indispensable”, Gartner Cool
Vendor Smart Machines”
"By continually learning from context,
Synthesys reveals insights that normally
go undetected, helping to avoid the “I-
don’t-know-what-I-don’t-know”
problem of most other analytics tools