Data Science is the study of the extraction of knowledge from data. What if we extract partial or inaccurate knowledge? This illusion of knowledge would lead us to make wrong decisions, with sometimes disastrous consequences such as in the case of medical diagnosis, security or other life and death situations. In this talk we’ll present ideas on how to validate the extracted knowledge by its predictive power and not by its ability to explain the past. We’ll also discuss special techniques for predicting very rare high value events.
Plenary presentation at University of Mary Washington Faculty Academy 2012, Virginia. An exploration of visual practice in our daily practice as teachers, learners and global citizens as a form of sense-making and information sharing.
Conference Site: http://blog12.facultyacademy.org/giulia-forsythe/
Recording: http://vimeo.com/42419735
Common Core State Standards:
CCSS.ELA-Literacy.RL.3.1: Ask and answer questions to demonstrate understanding of a text, referring explicitly to the text as the basis for the answers.
CCSS.ELA-Literacy.RL.3.5: Refer to parts of stories, dramas, and poems when writing or speaking about a text, using terms such as chapter, scene, and stanza; describe how each successive part builds on earlier sections.
CCSS.ELA-Literacy.RL.3.10: By the end of the year, read and comprehend literature, including stories, dramas, and poetry, at the high end of the grades 2-3 text complexity band independently and proficiently.
CCSS.ELA-Literacy.RF.3.4a: Read with sufficient accuracy and fluency to support comprehension.. Read on-level text with purpose and understanding.
CCSS.ELA-Literacy.RF.3.4c: Read with sufficient accuracy and fluency to support comprehension. Use context to confirm or self-correct word recognition and understanding, rereading as necessary.
CCSS.ELA-Literacy.SL.3.2:Determine the main ideas and supporting details of a text read aloud or information presented in diverse media and formats, including visually, quantitatively, and orally.
CCSS.ELA-Literacy.L.3.4a: Determine or clarify the meaning of unknown and multiple-meaning word and phrases based on grade 3 reading and content, choosing flexibly from a range of strategies. Use sentence-level context as a clue to the meaning of a word or phrase.
CCSS.ELA-Literacy.L.3.5b: Demonstrate understanding of word relationships and nuances in word meanings. Identify real-life connections between words and their use.
CCSS.ELA-Literacy.L.3.6: Acquire and use accurately grade-appropriate conversational, general academic, and domain-specific words and phrases, including those that signal spatial and temporal relationships.
Plenary presentation at University of Mary Washington Faculty Academy 2012, Virginia. An exploration of visual practice in our daily practice as teachers, learners and global citizens as a form of sense-making and information sharing.
Conference Site: http://blog12.facultyacademy.org/giulia-forsythe/
Recording: http://vimeo.com/42419735
Common Core State Standards:
CCSS.ELA-Literacy.RL.3.1: Ask and answer questions to demonstrate understanding of a text, referring explicitly to the text as the basis for the answers.
CCSS.ELA-Literacy.RL.3.5: Refer to parts of stories, dramas, and poems when writing or speaking about a text, using terms such as chapter, scene, and stanza; describe how each successive part builds on earlier sections.
CCSS.ELA-Literacy.RL.3.10: By the end of the year, read and comprehend literature, including stories, dramas, and poetry, at the high end of the grades 2-3 text complexity band independently and proficiently.
CCSS.ELA-Literacy.RF.3.4a: Read with sufficient accuracy and fluency to support comprehension.. Read on-level text with purpose and understanding.
CCSS.ELA-Literacy.RF.3.4c: Read with sufficient accuracy and fluency to support comprehension. Use context to confirm or self-correct word recognition and understanding, rereading as necessary.
CCSS.ELA-Literacy.SL.3.2:Determine the main ideas and supporting details of a text read aloud or information presented in diverse media and formats, including visually, quantitatively, and orally.
CCSS.ELA-Literacy.L.3.4a: Determine or clarify the meaning of unknown and multiple-meaning word and phrases based on grade 3 reading and content, choosing flexibly from a range of strategies. Use sentence-level context as a clue to the meaning of a word or phrase.
CCSS.ELA-Literacy.L.3.5b: Demonstrate understanding of word relationships and nuances in word meanings. Identify real-life connections between words and their use.
CCSS.ELA-Literacy.L.3.6: Acquire and use accurately grade-appropriate conversational, general academic, and domain-specific words and phrases, including those that signal spatial and temporal relationships.
Meyer-Practical tips for responsible and effective data sharingMichelle N. Meyer
Practical tips for responsible and effective data sharing. Presentation at Society for Psychophysiological Research 2018 Annual Meeting, Quebec City, Oct. 3, 2018. Part of Special Symposium, OPEN SCIENCE: FROM PRE-REGISTRATION TO REPLICATION TO DATA SHARING.
A slide deck in answer to the question: If funders require their awardees to share data, will they be placing them in an untenable position with their IRBs? Presented Dec. 6, 2016, at Funders' Forum co-sponsored by the Center for Open Science and the Health Research Alliance (https://cos.io/decemberforum2016/#overview).
How to Think Straight- Cognitive Debiasing Pat CroskerrySMACC Conference
"How to think straight: Cognitive de-biasing by Pat Croskerry
The number of preventable deaths of hospitalized patients in the US each year is estimated at 40,000- 80,000. The figure for the ICU alone is estimated at 40,000 so the death rate must be in the higher end of the range. When settings outside the hospital are taken into account (ED, primary care), the overall number must be considerably higher.
While many factors contribute to diagnostic failure, a variety of sources suggest that physician’s thinking has a lot to do with it. Dual Process Theory describes how the brain makes decisions in one of two modes: through fast, unconscious, intuitive processes (System 1) or through slower, conscious, analytical processes (System 2). Mental short-cuts (heuristics) and biases are predominantly located in the intuitive mode where we spend most of our conscious time, and this is where the majority of decision failures occur. Thinking straight essentially means achieving a good balance between System 1 and System 2 decision making, and much of our cognitive effort needs to go into monitoring what our unconscious brains are doing in System 1. This is referred to by a variety of terms: metacognition, reflection, mindfulness, and others. They all involve cognitive de-coupling from System 1 and characterize the process of cognitive de-biasing. This is not easily accomplished in the ED or any environment where decision density is often high, throughput pressure exists, resources may be limited, and where decision makers may be fatigued and/or sleep deprived.
While medicine has acquired a variety of strategies over the years for de-biasing clinicians, added benefits can be obtained by developing specific mindware to tackle particular biases. Clinicians need to be aware of the operating characteristics of the dual process model of decision making, of the prevalence and nature of biases, and of how to apply and sustain de-biasing mindware in their decision making.
"
Where AI will (and won't) revolutionize biomedicinePaul Agapow
Presented AI & Big Data Expo, London, December 2022.
Given the hype and success of machine learning and AI in other fields, its application in healthcare is only natural.
- However, the actual successes in medicine have been limited, with a number of high-profile failures.
- Here, I propose that biology is uniquely complex, with our lack of domain knowledge limiting the application of AI.
- However, there is reason for cautious optimism, with AI-lead approaches shifting the odds in our favour.
Reputable Sources in a Pandemic: How to Find and Evaluate Information You Can...Kara Gavin
A look at the news media and medical publishing realms in the time of COVID-19, with information and resources for finding and evaluating information.
Presented 2/12/21 to the Metropolitan Detroit Medical Library Group
A brief presentation on important research ethical concepts for research proposals. Given for the UQU Medical Research Club "Your Journey Towards Research" held at King Abdullah Medical City, Makka
USTUN_ Digital Health Assembly Open Innovation Conference: Sharing Global Da...Bedirhan Ustun
An inquiry about the use of Big Data in Health Information Systems as a new way of gathering new data. Inquiring ethical questions on ownership and orientation; analytic approaches and political implications for the society and decision making.
As the importance of having a data strategy in place is sinking in, many organizations have added a chief data officer (CDO) to their executive team to help create and implement that strategy. But every organization is doing this a little bit differently. This talk will describe how a variety of industries and organizations are using CDOs and will make recommendations for best practices.
I’ll present the new knowledge discovery tools we are building at Diffeo. Unlike traditional search engines that use keywords, Diffeo provides an in-browser knowledge base that accelerates information gathering about people, companies, chemical compounds, cyber events, or other real world entities. I’ll describe how Diffeo uses active learning to encourage long and deep user interactions in order to recommend new content for in-progress articles. As you write, the search results get better and more interesting, because the system can see more precisely which entity you mean and which you don’t (disambiguation) and also what you don’t know yet about the entity (discovery).
Finally in this presentation I’ll describe our experience organizing the Text REtrieval Conference (TREC) on Knowledge Base Acceleration (KBA) and Dynamic Domain (DD) which are pushing the state of the art in knowledge discovery on large streams. I’ll show you how to access the largest corpus of streaming text data ever released for public evaluations.
Meyer-Practical tips for responsible and effective data sharingMichelle N. Meyer
Practical tips for responsible and effective data sharing. Presentation at Society for Psychophysiological Research 2018 Annual Meeting, Quebec City, Oct. 3, 2018. Part of Special Symposium, OPEN SCIENCE: FROM PRE-REGISTRATION TO REPLICATION TO DATA SHARING.
A slide deck in answer to the question: If funders require their awardees to share data, will they be placing them in an untenable position with their IRBs? Presented Dec. 6, 2016, at Funders' Forum co-sponsored by the Center for Open Science and the Health Research Alliance (https://cos.io/decemberforum2016/#overview).
How to Think Straight- Cognitive Debiasing Pat CroskerrySMACC Conference
"How to think straight: Cognitive de-biasing by Pat Croskerry
The number of preventable deaths of hospitalized patients in the US each year is estimated at 40,000- 80,000. The figure for the ICU alone is estimated at 40,000 so the death rate must be in the higher end of the range. When settings outside the hospital are taken into account (ED, primary care), the overall number must be considerably higher.
While many factors contribute to diagnostic failure, a variety of sources suggest that physician’s thinking has a lot to do with it. Dual Process Theory describes how the brain makes decisions in one of two modes: through fast, unconscious, intuitive processes (System 1) or through slower, conscious, analytical processes (System 2). Mental short-cuts (heuristics) and biases are predominantly located in the intuitive mode where we spend most of our conscious time, and this is where the majority of decision failures occur. Thinking straight essentially means achieving a good balance between System 1 and System 2 decision making, and much of our cognitive effort needs to go into monitoring what our unconscious brains are doing in System 1. This is referred to by a variety of terms: metacognition, reflection, mindfulness, and others. They all involve cognitive de-coupling from System 1 and characterize the process of cognitive de-biasing. This is not easily accomplished in the ED or any environment where decision density is often high, throughput pressure exists, resources may be limited, and where decision makers may be fatigued and/or sleep deprived.
While medicine has acquired a variety of strategies over the years for de-biasing clinicians, added benefits can be obtained by developing specific mindware to tackle particular biases. Clinicians need to be aware of the operating characteristics of the dual process model of decision making, of the prevalence and nature of biases, and of how to apply and sustain de-biasing mindware in their decision making.
"
Where AI will (and won't) revolutionize biomedicinePaul Agapow
Presented AI & Big Data Expo, London, December 2022.
Given the hype and success of machine learning and AI in other fields, its application in healthcare is only natural.
- However, the actual successes in medicine have been limited, with a number of high-profile failures.
- Here, I propose that biology is uniquely complex, with our lack of domain knowledge limiting the application of AI.
- However, there is reason for cautious optimism, with AI-lead approaches shifting the odds in our favour.
Reputable Sources in a Pandemic: How to Find and Evaluate Information You Can...Kara Gavin
A look at the news media and medical publishing realms in the time of COVID-19, with information and resources for finding and evaluating information.
Presented 2/12/21 to the Metropolitan Detroit Medical Library Group
A brief presentation on important research ethical concepts for research proposals. Given for the UQU Medical Research Club "Your Journey Towards Research" held at King Abdullah Medical City, Makka
USTUN_ Digital Health Assembly Open Innovation Conference: Sharing Global Da...Bedirhan Ustun
An inquiry about the use of Big Data in Health Information Systems as a new way of gathering new data. Inquiring ethical questions on ownership and orientation; analytic approaches and political implications for the society and decision making.
As the importance of having a data strategy in place is sinking in, many organizations have added a chief data officer (CDO) to their executive team to help create and implement that strategy. But every organization is doing this a little bit differently. This talk will describe how a variety of industries and organizations are using CDOs and will make recommendations for best practices.
I’ll present the new knowledge discovery tools we are building at Diffeo. Unlike traditional search engines that use keywords, Diffeo provides an in-browser knowledge base that accelerates information gathering about people, companies, chemical compounds, cyber events, or other real world entities. I’ll describe how Diffeo uses active learning to encourage long and deep user interactions in order to recommend new content for in-progress articles. As you write, the search results get better and more interesting, because the system can see more precisely which entity you mean and which you don’t (disambiguation) and also what you don’t know yet about the entity (discovery).
Finally in this presentation I’ll describe our experience organizing the Text REtrieval Conference (TREC) on Knowledge Base Acceleration (KBA) and Dynamic Domain (DD) which are pushing the state of the art in knowledge discovery on large streams. I’ll show you how to access the largest corpus of streaming text data ever released for public evaluations.
An exposé on human-centered design, as related to data science and “medium data”. Examples of great API design will be showcased, as well as other end-user facing tools that can enable data scientists to share their observations with the world.
Mobile technology Usage by Humanitarian Programs: A Metadata Analysisodsc
CommCare, developed by Dimagi Inc., is an open-source mobile technology platform that supports hundreds of humanitarian frontline programs worldwide. The objective of this analysis is to demonstrate how CommCare metadata contains a wealth of information that can inform humanitarian programs in their use of mobile technology. This understanding can help programs determine the most effective way to implement CommCare or other mobile technology in resource-poor settings. A typical CommCare user is a frontline worker, such as a community health worker who provides outreach to pregnant women and children. An important feature of CommCare is that it supports case management, allowing users to register, update, and close cases in their CommCare application. A case is usually a user’s client, e.g., a pregnant woman who is supported by the CommCare user. While using CommCare, the user fills out electronic forms which eventually get submitted to the CommCare cloud server. The cumulative number of forms submitted by CommCare users as of December 2014 was just over 10 million. Metadata for each form submitted through CommCare are stored in Dimagi’s data platform; included in a form’s metadata are date and time stamps for when each form was started and ended by the user and when the form was eventually received by the cloud server.
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hiveodsc
The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey. In this workshop, you will load data and submit queries using Hadoop! Before jumping in to the technology, the Founders of DataKitchen review Hadoop and some of its technologies (MapReduce, Hive, Pig, Impala and Spark), look at performance, and present a rubric for choosing which technology to use when.
We’ve all been told to “work smarter, not harder.” But what does working smarter really mean? In the world of finance and trading, working smarter means working differently. None of us can compete against computers stacked inches away from the stock exchange or blue chip companies with multi-million dollar marketing campaigns. The key to winning is to go where the big guys haven’t and the way to do that is through diverse datasets. In this talk, you will discover the theory and tools to discover new datasets from unexpected sources in order to gain an upper-hand in both finance and business. So whether you’re a quant that trades in his bedroom or a restaurateur looking to grow his business, you’ll learn how the diversity of data can be the sharpest knife if your set.
Data Science at Dow Jones: Monetizing Data, News and Informationodsc
In this presentation I will describe the way Data Science supports the business of information and news at Dow Jones. Specifically, I will describe how we are introducing innovative and advanced large-scale information mining and analytic approaches not only into Dow Jones’ products but also into our strategy and decision making processes.Our goal is to impact every aspect of Dow Jones: from the way journalism is produced in the newsroom, to the way we create and deliver institutional products, to the way we improve retention and acquisition of subscribers. While the task seems broad and daunting, we have already achieved various successes through the application of machine learning, data mining, advanced analytics and big data approaches.In this presentation I will describe how we have achieved this, including our tools, data, approaches and mechanisms as well as describe what our plans are going forward.
Have you been in the situation where you’re about to start a new project and ask yourself, what’s the right tool for the job here? I’ve been in that situation many times and thought it might be useful to share with you a recent project we did and why we selected Spark, Python, and Parquet. My plan is take you through a use case that involves loading, transforming, aggregating, and persisting the dataset. We’ll use an open dataset consisting of full fund holdings graciously provided by Morningstar. My goal in presenting this use case are to have the audience learn about how these technologies can be applied to a real world problem and to inspire members of the audience to start learning these technologies and applying them to their own projects.
Building a Predictive Analytics Solution with Azure MLodsc
Create and operationalize a predictive model using Microsoft Azure Machine Learning.
– Perform the typical steps involved in building a predictive analytics solution such as data ingestion, data cleansing, data exploration, feature engineering, model selection and evaluation of model results
–learn how to use machine learning with big data scenarios using tools like Hadoop and SQL Server to process and work with such data.
Finding and classifying the mentions of the things named in text, often called Named Entity Recognition or NER, is a fundamental task in many search and analysis applications. Mature, robust NER technology is available for many languages and domains, from people, places, and products, to diseases, genes, and molecules. However, for emerging tasks like knowledge-base construction, mentions alone are insufficient.
In this presentation we’ll explore techniques that go beyond names to:
link mentions to one another and to rich knowledge sources like Wikidata
discover and characterise the relationships between entities that are explicit in the text
And we’ll discuss some of the most important practical implications of these advancements for open data science.
According to Credit Suisse’s Gender 3000 report, at the end of 2013, women accounted for 12.9% of top management in 3000 companies across 40 countries. However, since 2009, companies with women as 25-50% of their management team
returned 22-29%. If companies with women in management outperform so dramatically, what would happen if you invested in women-led companies? Karen Rubin will explore this question and share her findings after running a 12 year investment simulation.
Data science allows us to turn a dark forest into a world of
perpetual twilight by giving us the tools to better understand the data that surrounds us. Unfortunately, in this world of twilight we still need a flashlight to get a clean crisp image of our immediate surroundings. We will talk about how to use deep domain expertise as that flashlight shedding light on our understanding of data. Our focus will be on using text analysis as a means to examine qualitative information in a structured, quantitative way. We will draw heavily from examples in complex central bank policy and financial regulation.
Open Source Tools & Data Science Competitions odsc
This talk shares the presenter’s experience with open source tools in data science competitions. In the past several years Kaggle and other competitions have created a large online community of data scientists. In addition to competing with each other for fame and glory, members of this community also generously share knowledge, insights using forum and open source code. The open competition and sharing have resulted in rapid progress in the sophistication of the entire community. This presentation will briefly cover this journey from a competitor’s perspective, and share hands on tips on some open source tools proven popular and useful in recent competitions.
scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.
scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models.
The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline.
We will also cover how to build machine learning models on text data, and how to handle very large datasets.
Bridging the Gap Between Data and Insight using Open-Source Toolsodsc
Despite the proliferation of open-source tools for analysis (such as Python and R) and those used for visualization
(such as Javascript / D3), there often exist significant gaps between these areas, and those of us trying to navigate the complete arc from data to insight can encounter many obstacles along the way. Fortunately, in recent years there have been many efforts to fill these needs, and today distilling a meaningful visualization from raw data is faster and easier than ever before.
In this talk we will use will use examples in geospatial analysis and visualization to illustrate how to open-source tools like Python, geopandas, and TileMill work together. Using examples from the RunKeeper mobile app we will show how we currently use these tools to understand better our customers and their data, and to communicate
with our colleagues, external partners, and the data community at large.
Human-generated text may be the next frontier for big data analysis, but we humans are complicated beasts and the text we generate is messy and complicated in ways that can confound analysis. We’ll describe the top ten mistakes people make when they start doing text analysis, and hopefully save you from making a few of these mistakes yourself.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
1. Richard BijjaniRichard Bijjani
JUMPING TO CONCLUSIONS
(Generating Improbable Insights)
Richard Robehr Bijjani, Ph.D.
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
2.
3. • IMAGE: Chart of controlled vs uncontrolled?
• How best to change behaviors
1/3 of all deaths globally are
from cardiovascular disease
SOURCE: WORLD HEALTH ORGANIZATION
4. • IMAGE: Chart of controlled vs uncontrolled?
• How best to change behaviors
SOURCE: Mayo Clinic
#1 risk factor is high blood pressure
9. Quanttus is always on!
We capture > 50 million data
points and > 400,000 vital sign
measurements / person / day.
10. Richard BijjaniRichard Bijjani
Data Science @ Quanttus
Data Science ≡ Extraction of Actionable Knowledge
from Data
Actionable Knowledge
Better Decisions
Meaningful Insights
Knowledge is actionable iff it has predictive power
(not just an ability to explain the past)
11. Richard BijjaniRichard Bijjani
IUMRING TQ CQNGIUSIQNS
Illusion of Knowledge
fatal
The greatest enemy of knowledge is not ignorance,
it is the illusion of knowledge.
-Stephen Hawking
13. Richard BijjaniRichard Bijjani
First a Joke!
A police officer approaches a man intently
searching the ground under a lamppost
• Policeman: What are you doing?
• Man: Looking for my car keys
The officer helps for a few minutes without
success
• Policeman: Are you certain you
dropped your keys near here?
• Man : No! I remember dropping them
across the street.
• Policeman (very irritated): Why are
looking for them here then?
• Man : The light is much better here!
14. Richard BijjaniRichard Bijjani
Why Scientific Studies are so often Wrong
• Researchers tend to look for answers where the
looking is good, rather than where the answers are
likely to be hiding. David Freedman
•15/45 most prominent studies published in the top medical
journals were ultimately refuted.
•2/3 of all medical studies are wrong.
•9/10 of leading-edge studies (like those linking a disease to a
specific gene) are wrong.
John Ioannidis, University of Ioannina
15. Richard BijjaniRichard Bijjani
10% to 20% of cases:
delayed, missed, and incorrect diagnosis
garber, et al., jama, 2005
Researchers tend to look for answers where the looking is
good, rather than where the answers are likely to be hiding.
David Freedman
Why Scientific Studies are so often Wrong
16. Richard BijjaniRichard Bijjani
40,000+ patients in US ICU’s
may die with a misdiagnosis annually
winters, et al., bmj quality & safety, 2012
Researchers tend to look for answers where the looking is
good, rather than where the answers are likely to be hiding.
David Freedman
Why Scientific Studies are so often Wrong
17. Richard BijjaniRichard Bijjani
50% of MDs are below-average
vinod khosla
Researchers tend to look for answers where the looking is
good, rather than where the answers are likely to be hiding.
David Freedman
Why Scientific Studies are so often Wrong
18. Richard BijjaniRichard Bijjani
Are you Immune to the Streetlight Effect?
Researchers tend to look for answers where the looking is
good, rather than where the answers are likely to be hiding.
David Freedman
• Think of the data you are working with, is it the
ideal data, or just the conveniently available data?
• When was the last time you worked with ideal
data?
• Have you ever?
Why Scientific Studies are so often Wrong
19. Richard BijjaniRichard Bijjani
Expert Consensus
seventeen experts’ estimates of
the effect of screening on colon cancer deaths
0% 25% 50% 75% 100%
proportion of colon cancer deaths prevented
20. Richard BijjaniRichard Bijjani
Should you trust your Dr.?
• Depends.
• If your ailment is common, your Dr. will do a decent job.
• If you’re suffering from a relatively uncommon disease,
not so well.
“If you don’t find it often, you often don’t find it”,
Jeremy Wolfe
21. Richard BijjaniRichard Bijjani
Weak Link, Humans!
1. Signals with low predictive Values
are not very useful
1 in 1000 does not hold ones attention for
long
2. Attention directed at one thing, is
attention drawn away from
something else
Lost research/testing/treatment opportunities
22. Richard BijjaniRichard Bijjani
Data Scientists’ Tools of Choice
• Some scientists use only techniques they feel comfortable
with
• Others latch on to new ones without fully understanding
them.
• Some just rely on available methods built into their software.
23. Richard BijjaniRichard Bijjani
The Nonsense Asymmetry Principle
The amount of energy needed to
refute ‘Nonsense’ is an order of
magnitude bigger than to produce it.
-Alberto Brandolini
24. Richard BijjaniRichard Bijjani
Data Scientific Method
• Validation can ONLY occur
by measuring the
predictive power of the
insights, in addition to it’s
ability to explain the past
• Data Science is Science
and hence follows the
Scientific Method
Ask Relevant
Questions
Report Results
Research.
Gather Data.
Analyze
Results
Validate Data.
Construct Hypothesis
Design Experiment.
Test Hypothesis.
Hypothesi
s is True
Hypothesi
s not Valid
25. Richard BijjaniRichard Bijjani
Ask the Important Question
• Deadly Virus
• Infects 1 in 1 million
• Diagnostic Test developed with 99.9% Sensitivity and
Specificity
• Treatment developed
• 99% Curative
• 1% Deadly side effect
• Question: Would you recommend Diagnosis?
• Would you recommend Treatment?
26. Richard BijjaniRichard Bijjani
Efficacy of Treatment
• Do Nothing:
• 300 People Infected
• 300 People will Die
300M test
subjects
Predicted
Negative
Predicted
Positive
Normal 299. 7M 300,000
Infected <1 >299
• Diagnose and Treat:
• Infected Population:
• 296 Cured
• 3 + 1 Die
• Non-Infected Population
• 297,000 Unaffected (except for the scare)
• 3,000 Die
27. Richard BijjaniRichard Bijjani
Good Practices
• Understand were the data comes from.
• Pre-process / Clean your data, but keep validated outliers.
• Own the tools and adapt them to your own requirements.
• Follow the scientific Method.
• Analyze data to answer Question posed.
• Save a list of other interesting questions for later.
• Share your hypotheses with the team.
• Simple is better, at least make sure it’s deployable.
• Test, Validate, re-test.
• Communicate results correctly and set the right expectations.
"If you torture the data long enough, it will
confess to anything." - Hal Varian
29. Richard BijjaniRichard Bijjani
Pitfalls of data mining
• The hope: data miners pore over large, diffuse sets
of raw data trying to discern patterns that would
otherwise go undetected.
• The dark side of data mining is to pick and choose
from a large set of data to try to explain a small one
• “Given enough time, enough attempts and enough
imagination, almost any set of data can be teased
out of any conclusion”
30. Richard BijjaniRichard Bijjani
Limitations of Common Data
Mining Techniques
• Automated feature selection methods cannot
apply to rare (or unforeseen) events
• Normal events are similar, rare events are by
definition unique
• Accuracy measurements are not appropriate
• Real time detection of rare events is necessary,
but machine learning techniques construct
models based on the past
• If you haven’t yet seen, you cannot detect it!
31. Richard BijjaniRichard Bijjani
What are rare/high-value events?
• Rare or Outliers
• Occurs less then 1%
• For large datasets,
many samples exist.
Balance could be
achieved and
traditional Data
Mining Techniques
could be applied
• Preferential
sampling of rare class
• Under-Sampling of
majority class
• Extremely Rare or
Anomalies
• Statistical chance of
detection is zero
• Most databases don’t
‘naturally’ contain any
samples
• Properties of target
samples are not known
33. Richard BijjaniRichard Bijjani
Anomalies vs. High Value Rare
Events
By definition, anomalies are the
exception, but not necessarily
rare and/or of high value.
• Anomaly? Yes
• Rare Event? No
37. Richard BijjaniRichard Bijjani
Why incompatible?
• No Quality
Control
Suicide Bomb Trainer in Iraq
Accidentally Blows Up His
Class
Terrorist ‘lab’
(redacted)
40. Richard BijjaniRichard Bijjani
Takeaway
• We are drowning in data, yet
starving for knowledge
• In case of rare events, data may not
be enough, source of data need to
be well understood
• To detect rare events: Sometimes
it’s just more effective to generate
heuristics
• Heuristics cannot predict, while
machine learning assumes the
future will resemble the past, and
extremely rare events are not part
of the past
• What to do?
41. Richard BijjaniRichard Bijjani
Outliers revisited
1. Retain outliers in data set for analysis.
2. Exclude only those that are known to be
due to defective measurements or
transcription errors
1. Need to understand data origin to
accurately separate rare events from
measurement errors
3. Do not assume normal distribution
42. Richard BijjaniRichard Bijjani
Data Mining Techniques
• Supervised
• pro: Human readable
models
• con: Requires labeled
data
• Unsupervised
• Pro: Deviation
detection, no labeling
needed
• Con: Requires similarity
measures, high false
alarm (due to benign
yet previously unseen
data)
43. Richard BijjaniRichard Bijjani
Unsupervised Techniques
• Outlier datum defined as different from the rest of the
data
• Rare event: Same definition
• Detection Approaches
• Statistics based
• Distance Based
• Model Based
44. Richard BijjaniRichard Bijjani
Unsupervised: Statistics
• Data modeled using stochastic distribution
• Advantages: no a priori knowledge required
• Disadvantages:
• Fails with high dimensions (curse of dimensionality)
• Does not identify patterns of rare events
• Sample implementations:
• Finite Mixtures Schemes, e.g. SmartSifter. Use histogram density to
represent probability distribution
• Blocked Adaptive Computationally Efficient Outlier Nominator, BACON
• Probability Distributions
• Entropy Measures
45. Richard BijjaniRichard Bijjani
Unsupervised: Distance
• Distance computed between neighbors, and data points sorted
• Advantages: no a priori knowledge required
• Disadvantages:
• Not suitable for rare classes
• Sample implementations:
• k-Nearest Neighbor
• Mahalanobis Distance for skewed distributions
• Local Outlier Factor (LOF) for variable density cluster (average distance
between points are different in different clusters)
• Specialized Clustering, Canopy, FindOut
46. Richard BijjaniRichard Bijjani
Unsupervised: Model
• Predict normal behavior via model
• Capture deviations
• Detection Approaches
• Neural Networks, 4 layers, input = output
• Unsupervised support vector machines SVM
47. Richard BijjaniRichard Bijjani
Supervised Techniques
• Classification methods typically not suitable:
• Problem: Lack of labels
• Possible Solution: Balance the class size
• Duplicate rare events or down-size normal
events
• Generate anomalies inversely proportional
with data density
• Synthetically generate minority over-
sampled events (e.g. SMOTE)
• Classify regions as ‘positive’ without having
enough data in them
• Shrink: look for presence of positive labels,
not majority
• PN-rule: Find regions of high recall (Pd),
then prune false positives, then classify
(avoid over-fitting)
• Decision Tree methods: Ripple Down Rules,
CREDOS, Boosting Classifiers, Random
Forest
48. Richard BijjaniRichard Bijjani
Cost Functions
• In any classification problem, one needs to
minimize the cost function
• Selecting an appropriate cost function is key
• Weighting is also important, not all data points are
created equal.
• Bayesian Thinking is necessary
• Temporal (time-series) Analysis requires different
approaches.
• Is current data ‘surprising’ based on historical data
created with the same underlying process?
• Opportunity for Insight, error, or rare event capture.
49. Richard BijjaniRichard Bijjani
The weak Link
No matter how good your
automated system is, final
decision to act or not is
often a human!
Present only relevant data
to make the right decision
Actionable information
Present data in human
readable format.
Visualization! Be creative,
different.
50. Richard BijjaniRichard Bijjani
Detecting Extremely Rare Events
Data
Collection
•Capture high
SNR
representative
data
Pre-
process
•Clean the data
from known
noise and
artifacts
Feature
Extraction
•Reduce data to
meaningful
feature with no
loss of desired
signal
Classifier
• Divide data into
training/testing and
use appropriate
classifiers
• Always use feature
confidences
Identify
Outliers
•Data that is not ‘normal’
•Determine if physically
appropriate or measurement
error. Delete errors.
Explain Data
•Any Insights?
What does it
mean, sub-
category
classification
Present Data
•Visualizati
on, UI, UX,
51. Richard BijjaniRichard Bijjani
Conclusion
• Experiment, test. Iterate.
• Do your homework, learn the physical origin of your data.
• Pre-process data.
• Develop your own method, all methods have weaknesses and strength, learn to
combine.
• Know your customer. Stay focused on the ‘Question’.
• Simplify. Needs to run in the real world
• Avoid Bias, Biased Samples Biased Outcome
• Never use the test and validation data in the training phase, not even for scaling
purposes.