Using AI/ML in drug discovery to repurpose new drugs. General cautions about the use of artificial intelligence and general pitfalls and best practices for generating data.
Correctness in Data Science - Data Science Pop-up SeattleDomino Data Lab
Presented by: Benjamin S. Skrainka is a Principal Data Scientist and Lead Instructor at Galvanize, Inc. For several decades, he has built practical solutions to relevant problems using the best statistical and engineering tools. His expertise spans several problem domains, including sequencing DNA, estimating demand for differentiated products, measuring advertising efficacy, and forecasting for capacity planning. Ben earned an AB in Physics from Princeton University and a PhD in Economics from University College London.
Clare Corthell: Learning Data Science Onlinesfdatascience
Clare Corthell, Data Scientist and Designer at Mattermark, and author of the Open Source Data Science Masters, shares her experience teaching herself data science with online resources. http://datasciencemasters.org/
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
My talk at PyData NYC, 2018.
This is the abstract:
Hugo Bowne-Anderson, data scientist and host of the DataFramed podcast, will give you a view into the thinking of 50 leading data scientists from around the world about the trends driving the data science revolution. During his interviews with these thought leaders, Hugo discovered themes and lessons about the past, present, and future of data science.
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Ogechi Onuoha
An Introduction to the focus of my research. I presented this to the members of the Pipeline research group, University of Lagos Nigeria. I will be making subsequent presentations as well as paper reviews on the same topic.
Correctness in Data Science - Data Science Pop-up SeattleDomino Data Lab
Presented by: Benjamin S. Skrainka is a Principal Data Scientist and Lead Instructor at Galvanize, Inc. For several decades, he has built practical solutions to relevant problems using the best statistical and engineering tools. His expertise spans several problem domains, including sequencing DNA, estimating demand for differentiated products, measuring advertising efficacy, and forecasting for capacity planning. Ben earned an AB in Physics from Princeton University and a PhD in Economics from University College London.
Clare Corthell: Learning Data Science Onlinesfdatascience
Clare Corthell, Data Scientist and Designer at Mattermark, and author of the Open Source Data Science Masters, shares her experience teaching herself data science with online resources. http://datasciencemasters.org/
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
My talk at PyData NYC, 2018.
This is the abstract:
Hugo Bowne-Anderson, data scientist and host of the DataFramed podcast, will give you a view into the thinking of 50 leading data scientists from around the world about the trends driving the data science revolution. During his interviews with these thought leaders, Hugo discovered themes and lessons about the past, present, and future of data science.
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Ogechi Onuoha
An Introduction to the focus of my research. I presented this to the members of the Pipeline research group, University of Lagos Nigeria. I will be making subsequent presentations as well as paper reviews on the same topic.
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
HackerEarth is pleased to announce its next session to help you understand what it really takes to become a data scientist.
Agenda of this session will include answers to the following questions:
- Why is it the best time to take up Data Science as a career?
- How can you take the first step in Data Science? (After all, first step is always the hardest!)
- How can you become better and progress fast?
- How is life after becoming a Data Scientist?
Speaker:
Jesse Steinweg-Woods is soon-to-be a Senior Data Scientist at tronc, working on recommender systems for articles and understanding customer behavior. Previously, he worked at Argo Group Insurance on new pricing models that took advantage of machine learning techniques. He received his PhD in Atmospheric Science from Texas A&M University, and his research focused on numerical weather and climate prediction.
What is the basis for the Data Science course and Data Scientist to know?
1-Algorithm
2-Data
3-Ask The Right Question
4-Predict an answer
5- Copy other people's work to do data science
Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows.
http://theinnovationenterprise.com/summits/digital-web-analytics-summit-london-2015/schedule
Operationalizing Machine Learning in the Enterprisemark madsen
TDWI Munich 2019
What does it take to operationalize machine learning and AI in an enterprise setting?
Machine learning in an enterprise setting is difficult, but it seems easy. All you need is some smart people, some tools, and some data. It’s a long way from the environment needed to build ML applications to the environment to run them in an enterprise.
Most of what we know about production ML and AI come from the world of web and digital startups and consumer services, where ML is a core part of the services they provide. These companies have fewer constraints than most enterprises do.
This session describes the nature of ML and AI applications and the overall environment they operate in, explains some important concepts about production operations, and offers some observations and advice for anyone trying to build and deploy such systems.
Data science is having a growing effect on our lives, from the content we see on social media feeds to the decisions businesses are making. Along with successes, data science has inspired much hype about what it is and what it can do. So I plan to try and demystify data science and have a discussion about what it really is. What does a day-in-the-life look like? What tools and skills are needed? How is data science successfully applied in the real world? In this talk, I’ll be providing insight into these questions and also speculate the future of data science and its place in business and technology.
Presented at OpenWest 2018
Data Science Popup Austin: Privilege and Supervised Machine LearningDomino Data Lab
Watch talk ⇒ http://bit.ly/1SGuwNs
I'll use the example of sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society. A sentiment analysis algorithm is considered ‘table stakes’ for any serious text analytics platform in social media, finance, or security. As an example of supervised machine learning, I'll show how these systems are trained. But I'll also show that they have the unavoidable property that they are better at spotting unsubtle expressions of extreme emotion. Such crude expressions are used by a particularly privileged group of authors: men. In this way, brands that depend on sentiment analysis to 'learn what people think' inevitably pay more attention to men. The problem doesn't stop with sentiment analysis: at every step of any model building process, we make choices that can introduce bias, enhance privilege, or break the law! I'll review these pitfalls, talk about how you can recognize them in your own work, and touch on some new academic work that aims to mitigate these harms.
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
The growing complexity of data science leads to black box solutions that few people in an organization understand. You often hear about the difficulty of interpretability—explaining how an analytic model works—and that you need it to deploy models. But people use many black boxes without understanding them…if they’re reliable. It’s when the black box becomes unreliable that people lose trust.
Mistrust is more likely to be created by the lack of reliability, and the lack of reliability is often the result of misunderstanding essential elements of analytics infrastructure and practice. The concept of reproducibility—the ability to get the same results given the same information—extends your view to include the environment and the data used to build and execute models.
Mark Madsen examines reproducibility and the areas that underlie production analytics and explores the most frequently ignored and yet most essential capability, data management. The industry needs to consider its practices so that systems are more transparent and reliable, improving trust and increasing the likelihood that your analytic solutions will succeed.
This talk will treat the black boxed of ML the way management perceives them, as black boxes.
There is much work on explainable models, interpretability, etc. that are important to the task of reproducibility. Much of that is relevant to the practitioner, but the practitioner can become too focused on the part they are most familiar with and focused on. Reproducing the results needs more.
In This Data Science course ( Graduate Program ) I will focus on understanding business intelligence systems and helping future managers use and understand analytics, Business Intelligence emphasizing the applications and implementations behind the concepts. a solid foundation of BI that is reinforced with hands-on practice. The course is also designed as an introduction to programming and statistics for students from many different majors. It teaches practical techniques that apply across many disciplines and also serves as the technical foundation for more advanced courses in data science, statistics, and computer science.
I recently was asked what sort of things i have considered in my experience of performing data integration in pharma for drug discovery. So here's the ten things i thought most important!
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...DataScienceConferenc1
Autonomy in targeting is a function that could be applied to any intelligent system, in particular the rapidly expanding array of robotic systems, in the air, on land and at sea – including swarms of small robots. This is an area of significant investment and emphasis for many armed forces, and the question is not so much whether we will see more intelligent robots, but whether and by what means they will remain under human control. Today’s remote-controlled weapons could become tomorrow’s autonomous weapons with just a software upgrade. The central element of any future autonomous weapon system will be the software. Military powers are investing in AI for a wide range of applications10 and significant efforts are already underway to harness developments in image, facial and behavior recognition using AI and machine learning techniques for intelligence gathering and “automatic target recognition” to identify people, objects or patterns. Although not all autonomous weapon systems incorporate AI and machine learning, this software could form the basis of future autonomous weapon systems.
The talk deals with the recent success in AI, and how it is transferable to the medical domain. Despite the progress in machine learning, the medical domain has not profited so much from the hype.
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
HackerEarth is pleased to announce its next session to help you understand what it really takes to become a data scientist.
Agenda of this session will include answers to the following questions:
- Why is it the best time to take up Data Science as a career?
- How can you take the first step in Data Science? (After all, first step is always the hardest!)
- How can you become better and progress fast?
- How is life after becoming a Data Scientist?
Speaker:
Jesse Steinweg-Woods is soon-to-be a Senior Data Scientist at tronc, working on recommender systems for articles and understanding customer behavior. Previously, he worked at Argo Group Insurance on new pricing models that took advantage of machine learning techniques. He received his PhD in Atmospheric Science from Texas A&M University, and his research focused on numerical weather and climate prediction.
What is the basis for the Data Science course and Data Scientist to know?
1-Algorithm
2-Data
3-Ask The Right Question
4-Predict an answer
5- Copy other people's work to do data science
Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows.
http://theinnovationenterprise.com/summits/digital-web-analytics-summit-london-2015/schedule
Operationalizing Machine Learning in the Enterprisemark madsen
TDWI Munich 2019
What does it take to operationalize machine learning and AI in an enterprise setting?
Machine learning in an enterprise setting is difficult, but it seems easy. All you need is some smart people, some tools, and some data. It’s a long way from the environment needed to build ML applications to the environment to run them in an enterprise.
Most of what we know about production ML and AI come from the world of web and digital startups and consumer services, where ML is a core part of the services they provide. These companies have fewer constraints than most enterprises do.
This session describes the nature of ML and AI applications and the overall environment they operate in, explains some important concepts about production operations, and offers some observations and advice for anyone trying to build and deploy such systems.
Data science is having a growing effect on our lives, from the content we see on social media feeds to the decisions businesses are making. Along with successes, data science has inspired much hype about what it is and what it can do. So I plan to try and demystify data science and have a discussion about what it really is. What does a day-in-the-life look like? What tools and skills are needed? How is data science successfully applied in the real world? In this talk, I’ll be providing insight into these questions and also speculate the future of data science and its place in business and technology.
Presented at OpenWest 2018
Data Science Popup Austin: Privilege and Supervised Machine LearningDomino Data Lab
Watch talk ⇒ http://bit.ly/1SGuwNs
I'll use the example of sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society. A sentiment analysis algorithm is considered ‘table stakes’ for any serious text analytics platform in social media, finance, or security. As an example of supervised machine learning, I'll show how these systems are trained. But I'll also show that they have the unavoidable property that they are better at spotting unsubtle expressions of extreme emotion. Such crude expressions are used by a particularly privileged group of authors: men. In this way, brands that depend on sentiment analysis to 'learn what people think' inevitably pay more attention to men. The problem doesn't stop with sentiment analysis: at every step of any model building process, we make choices that can introduce bias, enhance privilege, or break the law! I'll review these pitfalls, talk about how you can recognize them in your own work, and touch on some new academic work that aims to mitigate these harms.
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
The growing complexity of data science leads to black box solutions that few people in an organization understand. You often hear about the difficulty of interpretability—explaining how an analytic model works—and that you need it to deploy models. But people use many black boxes without understanding them…if they’re reliable. It’s when the black box becomes unreliable that people lose trust.
Mistrust is more likely to be created by the lack of reliability, and the lack of reliability is often the result of misunderstanding essential elements of analytics infrastructure and practice. The concept of reproducibility—the ability to get the same results given the same information—extends your view to include the environment and the data used to build and execute models.
Mark Madsen examines reproducibility and the areas that underlie production analytics and explores the most frequently ignored and yet most essential capability, data management. The industry needs to consider its practices so that systems are more transparent and reliable, improving trust and increasing the likelihood that your analytic solutions will succeed.
This talk will treat the black boxed of ML the way management perceives them, as black boxes.
There is much work on explainable models, interpretability, etc. that are important to the task of reproducibility. Much of that is relevant to the practitioner, but the practitioner can become too focused on the part they are most familiar with and focused on. Reproducing the results needs more.
In This Data Science course ( Graduate Program ) I will focus on understanding business intelligence systems and helping future managers use and understand analytics, Business Intelligence emphasizing the applications and implementations behind the concepts. a solid foundation of BI that is reinforced with hands-on practice. The course is also designed as an introduction to programming and statistics for students from many different majors. It teaches practical techniques that apply across many disciplines and also serves as the technical foundation for more advanced courses in data science, statistics, and computer science.
I recently was asked what sort of things i have considered in my experience of performing data integration in pharma for drug discovery. So here's the ten things i thought most important!
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...DataScienceConferenc1
Autonomy in targeting is a function that could be applied to any intelligent system, in particular the rapidly expanding array of robotic systems, in the air, on land and at sea – including swarms of small robots. This is an area of significant investment and emphasis for many armed forces, and the question is not so much whether we will see more intelligent robots, but whether and by what means they will remain under human control. Today’s remote-controlled weapons could become tomorrow’s autonomous weapons with just a software upgrade. The central element of any future autonomous weapon system will be the software. Military powers are investing in AI for a wide range of applications10 and significant efforts are already underway to harness developments in image, facial and behavior recognition using AI and machine learning techniques for intelligence gathering and “automatic target recognition” to identify people, objects or patterns. Although not all autonomous weapon systems incorporate AI and machine learning, this software could form the basis of future autonomous weapon systems.
The talk deals with the recent success in AI, and how it is transferable to the medical domain. Despite the progress in machine learning, the medical domain has not profited so much from the hype.
OSINT Black Magic: Listen who whispers your name in the dark!!!Nutan Kumar Panda
Open Source Intelligence is the art of collecting information which is scattered on publicly available sources. With evolution of social media and digital marketplaces a huge amount of information is constantly generated on the Internet (sometimes even without our conscious consent). This is of great concern for organizations and businesses as chances of confidential data floating in the public domain may seriously harm their business integrity. All recent hacks are related to internal source code disclosure, API keys leakage, known vulnerability in third party plugin, data dump leaks etc. Based on experience and robust research in this domain, for this talk the speakers have created a tool which will help all kind of organizations to monitor cyberspace effectively without much investment. This tool is simple but an effective solution which is capable of hearing digital whispers which are usually missed or ignored but shouldn’t be.
The top mistakes you're making in your Data Science interview - Omri AlloucheOmri Allouche
To be a great Data Scientist, you need to be a good mathematician, a curious analyst, a smart computer scientist and an expert in the problem domain. Furthermore, the field is moving so fast, you have to run at full speed just to stay in place. How should you balance these skills?
When interviewing candidates for Gong.io, we try to evaluate how well the candidate will tackle the large variety of research tasks we face, including Speech Recognition, Video and Audio analysis, NLP and statistical hypothesis testing. In this talk, I'll give an inside pick into our Data Science interview, and will list the top mistakes I see people make preparing for Data Science interviews, hoping to help you excel in your next interview and next position.
You can view a low-quality recording of the talk at https://www.youtube.com/watch?v=yu0HAudwGEA
Bio:
Omri Allouche heads the Research department at Gong.io, helping sales organizations improve their performance by providing actionable, data-driven insights using machine learning.
He also teaches Applied Data Science at Bar Ilan University, and was the founder and CEO of Page2site (acquired by Algomizer), an algorithms engineer at Elisra, and researcher at IDF's intelligence unit.
Omri holds a Ph.D. in Computational Ecology from the Hebrew University (cum laude). He won several academic awards and scholarships, including the Clore fund, and his research papers had been cited over 2,000 times.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/09/responsible-ai-tools-and-frameworks-for-developing-ai-solutions-a-presentation-from-intel/
Mrinal Karvir, Senior Cloud Software Engineering Manager at Intel, presents the “Responsible AI: Tools and Frameworks for Developing AI Solutions” tutorial at the May 2023 Embedded Vision Summit.
Over 90% of businesses using AI say trustworthy and explainable AI is critical to business, according to Morning Consult’s IBM Global AI Adoption Index 2021. If not designed with responsible considerations of fairness, transparency, preserving privacy, safety and security, AI systems can cause significant harm to people and society and result in financial and reputational damage for companies.
How can we take a human-centric approach to design AI solutions? How can we identify different types of bias and what tools can we use to mitigate those? What are model cards, and how can we use them to improve transparency? What tools can we use to preserve privacy and improve security? In this talk, Karvir discusses practical approaches to adoption of responsible AI principles. She highlights relevant tools and frameworks and explores industry case studies. She also discusses building a well-defined response plan to help address an AI incident efficiently.
Future of data science as a professionJose Quesada
How can you thrive in a future where machine learning has been popular for a few years already?
In this talk, I will give you actionable advice from my experience training serious data scientists at our retreat center in Berlin. You are going to face these pointy, hard questions:
- What is the promise of machine learning? Has it happened yet?
- Is it easy to take advance of machine learning, now that most algorithms are nicely packaged in APIs and libraries?
- How much time should I spend getting good at machine learning? Am I good enough now?
- Are data scientists going to be replaced by algorithms? Are we all?
- Is it easy to hire talent in machine learning after the explosion of MOOCs?
While machine learning is an exciting subject, it is wrong to assume that it will solve all your problems. Scroll down to take a look at some myths in the machine learning field and how to overcome them.
Do No Harm: Do Technologists Need a Code of Ethics?Thoughtworks
Nothing is neutral, and the technology we design and build, isn’t objective. How do we ensure that what starts out as a great idea, doesn’t unintentionally (or intentionally) harm? Trolling, racially biased algorithms, surveillance capitalism, how do we assess our creations through an ethical lens so our products don’t amplify social biases? Do we need a code of ethics? How do we build ethics in our practice?
In this talk Sofia explores these questions and builds on the conversations that are happening globally within the technology community. She also talks about the Responsible Tech Playbook that ThoughtWorks is building which collate ethical frameworks and explore how to use them in design and delivery of software.
SPEAKER:
Sofia Woods, Senior Experience Designer, ThoughtWorks
Sofia has over 10 years experience solving complex problems and designing digital products, experiences and services across government, financial services, transport and the private sectors. She’s a multi-disciplined designer, experienced with the whole gamut of Human Centred Design approaches including UX research, user interface design, prototyping/ testing and can apply this approach in large scale software delivery environments. Blending human centred design with strategy and technology, she creates meaningful experiences that transform.
Iconuk 2016 - IBM Connections adoption Worst practices!Femke Goedhart
Regardless if you've implemented IBM Connections, are considering it or in the middle of the planning stages - there are wrong (and right) turns to take at every step. Join Femke to learn about misconceptions and tribulations others have faced while striving to become a socially enabled company. Hear about real World examples and often funny anecdotes from the trenches of adoption to show you how NOT to do it and giving you tips on how to do it better along the way.
Walk away with a grasp on what to focus on to make a success out of your IBM Connections environment.
Idiots guide to setting up a data science teamAshish Bansal
Some nuggets of how I started the data science practice at Gale Partners on a budget. Presented at the Toronto Hadoop Users Group (THUG) in April, 2015.
Consider Your Own Black Box: Evaluating Human Intelligence Alongside Artifici...Jack Pringle
Artificial intelligence (AI) systems are playing significant roles in decision-making processes that affect our lives. However, decisions made in a “black-box” fashion (such as algorithms hidden from view or evaluation), rarely inspire confidence or build trust. Moreover, opaque decision-making may run afoul of legal frameworks (for example the Fair Credit Reporting Act) that require support for certain decisions.
Because of the significance of the decisions that AI makes, the decision-making AI should be explainable and trustworthy.
Scientists from the National Institute of Standards and Technology (NIST) have proposed four fundamental principles for explainable AI:
• Explanation. Systems deliver evidence or reasons for all their outputs.
• Meaningful. Systems provide explanations that are meaningful or understandable to individual users.
• Explanation Accuracy. The explanation correctly reflects the system’s process for generating the output.
• Knowledge Limits. The system only operates under conditions for which it was designed or when the system reaches a sufficient confidence in its output. (If a system has insufficient confidence in its decision, it should not supply a decision to the user.)
This NIST draft also asks whether human decision-making can satisfy these principles. NIST concludes that human decision-making can only do so (if at all) in a limited way, due to how our brains consciously and unconsciously process information. Comparing AI system decision making with the human decision process can help us evaluate the relative risks and benefits of using AI systems, and learn more about the upside and pitfalls of our own human decision-making systems.
This presentation will address some of the cognitive biases (reasoning flaws) that may affect not only legal decision-making processes, but health and well-being choices. By learning to be aware of cognitive bias and the way it may influence our thoughts and actions, we can improve our decision process- and hopefully our choices.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
2. Do Not Distribute - Copyright BioTeam, Inc., All Rights Reserved
What is AI?
2020 2
AI means using empirical data to generate an algorithm
that can predict or make decisions on, new data.
Deep Learning
Method where features are not
explicitly outlined
Machine Learning
Methods that improve with
experience via implicit algorithms
Artificial Intelligence
Methods where computers to
make decisions imitating humans
AI
ML
DL
3. Do Not Distribute - Copyright BioTeam, Inc., All Rights Reserved
What is AI?
• It is not a black box
• Results are not fact
• It will probably not replace
traditional methods
• Not difficult to get started
2020 3
AI means using empirical data to generate an algorithm
that can predict or make decisions on, new data.
• It is a method with inputs and
outputs
• Results are mathematical
• It may replace traditional
methods
• A tool
4. When is it appropriate to use AI?
• Now
• You may be using it already
• When time-to-solution matters
• When throughput matters
• When there are many covariates to consider
• When modeling is to difficult and will take time to develop
• When you have lots of data
4
5. What if I don’t know where to start
• Start with an specific problem statement
• Are there aliens out there?
• Can I identify something in a image?
5
6. Tips and Tricks
• Check out Google’s Teachable Machine https://teachablemachine.withgoogle.com
7. Tips and Tricks
• Check out Tensorflow’s Playground http://playground.tensorflow.org
8. Tips and Tricks
• Check out Andrej Karpathy blog http://karpathy.github.io/2019/04/25/recipe/
9. Data tips
9
• Your data will likely
• Have artifacts
• Be incomplete
• Be skewed
• Be biased
• Be wrong
• Be multi-modal
• Be noisy
This is part of AI. Embrace it. Accept now that your data will be bad.
11. Tools for drug discovery
• Molecular Docking aims to find
drugs that fit in areas of an
organism that interfere with
typical function
• It can take minutes to days to
sample a single molecule with
various conformations
• We may not have a good idea of
the target site
11
Source Wikimedia Commons
12. ChemProp
• A deep learning framework for
drug discovery
• Developed by MIT’s CSAIL
• Pulls drugs from the Broad
Repurposing Hub
• Uses Message Passing Neural
Network (MPNN)
• Input features is fairly simple
12
Data encoding for training data
SMILES Activity
COC1=CC(=C(C=C1)OC)C2=C3C=C(C(=O)C=C3OC4=CC(=C(C=C42)O)O)
O
1
COC1=CC(=C(C=C1)/C=N/NC(=O)C2=NN(C(=N2)C3=CC=CC=C3)C4=CC=
CC=C4)O
1
CN1C2=C(C=C(C=C2)NC(=O)CCl)N(C1=O)C 1
CCS(=O)(=O)N1C(CC(=N1)C2=CC(=CC=C2)NS(=O)(=O)C)C3=CC=C(C=C
3)C
0
CCOC1=CC=C(C=C1)NC(=O)CSC2=NN=C(C=C2)C3=CC=CC=N3 1
CCOC1=CC=C(C=C1)CNC(=O)C2CCN(CC2)S(=O)(=O)C3=CC4=C(C=C3)N
C(=O)CCC4
1
CCOC(=O)N1CCN(CC1)S(=O)(=O)C2=CC=C(C=C2)C(=O)NNC3=NC4=C(C
=CC=C4S3)C
1
CCN(CC)S(=O)(=O)C1=CC=CC(=C1)C(=O)N[C@@H](C(C)C)C(=O)NNC(=
O)C2=CC=CC=C2
0
CCN(CC)S(=O)(=O)C1=CC=C(C=C1)S(=O)(=O)N2CCCC2C(=O)O 1
CCN(CC)C1=CC(=C(C=C1)/C=N/NC(=O)C2=CC(=CC=C2)S(=O)(=O)NC3=
CC=CC=C3OC)O
1
CCCN1C=NC2=C1C=C(C(=C2N)C)C 0
CCCN1C(=O)C(SC1=O)CC(=O)NC2=CC=C(C=C2)C 1
CCC(C)NC(=O)C1CCN(CC1)S(=O)(=O)C2=CC=CC3=C2N=CC=C3 0
16. Playing with ChemProp
16
3CLpro Inhibition prediction from SARS-CoV model
Drug Name SMILES Activity Probability
Zafirlukast
Cc1ccccc1S(=O)(=O)NC(=O)c2cc(OC)c(
cc2)Cc3cn(C)c4ccc(cc43)NC(=O)OC5C
CCC5
0.72431216
Montelukast
CC(C)(C1=CC=CC=C1CCC(C2=CC=CC(
=C2)C=CC3=NC4=C(C=CC(=C4)Cl)C=C
3)SCC5(CC5)CC(=O)O)O idasanutlin
0.60056485
Ritonavir
CC(C)C1=NC(=CS1)CN(C)C(=O)NC(C(C
)C)C(=O)NC(CC2=CC=CC=C2)CC(C(CC
3=CC=CC=C3)NC(=O)OCC4=CN=CS4)O
0.51782315
Remdesivir
CCC(CC)COC(=O)C(C)NP(=O)(OCC1C(
C(C(O1)(C#N)C2=CC=C3N2N=CN=C3N)
O)O)OC4=CC=CC=C4
0.46806238
Indinavir
CC(C)(C)NC(=O)C1CN(CCN1CC(CC(CC
2=CC=CC=C2)C(=O)NC3C(CC4=CC=CC
=C34)O)O)CC5=CN=CC=C5
0.42568066
Carfilzomib
CC(C)CC(C(=O)C1(CO1)C)NC(=O)C(CC
2=CC=CC=C2)NC(=O)C(CC(C)C)NC(=O)
C(CCC3=CC=CC=C3)NC(=O)CN4CCOC
C4
0.40163301
17. 17
Disclaimer:
This was an exercise to explore ChemProp, not
SARS-CoV. These results are preliminary at best
and need to be thoroughly explored and peer
reviewed before any conclusions or medically-
relevant actions can be taken. Please note that
the information presented has not been formally
peer reviewed and expresses the opinions of the
BioTeam.
In short: it was a toy example and does not
constitute any medical advice!
Come talk to BioTeam about
your scientific goals