Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision making.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
A changing market landscape and open source innovations are having a dramatic impact on the consumability and ease of use of data science tools. Join this session to learn about the impact these trends and changes will have on the future of data science. If you are a data scientist, or if your organization relies on cutting edge analytics, you won't want to miss this!
Data science is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision making.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
In this presentation, I have talked about Big Data and its importance in brief. I have included the very basics of Data Science and its importance in the present day, through a case study. You can also get an idea about who a data scientist is and what all tasks he performs. A few applications of data science have been illustrated in the end.
Data science is different from Data Analytics,Data Engineering,Big Data.
Presentation about Data Science.
What is Data Science its process future and scope.
Data Science Presentation By Amit Singh.
"Sexiest job of 21st century"
Introduction to the ethics of machine learningDaniel Wilson
A brief introduction to the domain that is variously described as the ethics of machine learning, data science ethics, AI ethics and the ethics of big data. (Delivered as a guest lecture for COMPSCI 361 at the University of Auckland on May 29, 2019)
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
My presentation on Data Mining, Lessons from Competitions, and Public Data looks at the Data Mining/Data Science/Big Data evolution, reviews lessons from KDD Cup 1997, Netflix Prize, and Kaggle, presents a big list of Public and Government data APIs, Marketplaces, Portals, and Platforms, and examines Big Data Hype. This talk was given at BPDM-2013, (Broadening Participation in Data Mining), Aug 10, 2013 held at KDD-2013, Chicago.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Defining Data Science
• What Does a Data Science Professional Do?
• Data Science in Business
• Use Cases for Data Science
• Installation of R and R studio
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
In this presentation, I have talked about Big Data and its importance in brief. I have included the very basics of Data Science and its importance in the present day, through a case study. You can also get an idea about who a data scientist is and what all tasks he performs. A few applications of data science have been illustrated in the end.
Data science is different from Data Analytics,Data Engineering,Big Data.
Presentation about Data Science.
What is Data Science its process future and scope.
Data Science Presentation By Amit Singh.
"Sexiest job of 21st century"
Introduction to the ethics of machine learningDaniel Wilson
A brief introduction to the domain that is variously described as the ethics of machine learning, data science ethics, AI ethics and the ethics of big data. (Delivered as a guest lecture for COMPSCI 361 at the University of Auckland on May 29, 2019)
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
My presentation on Data Mining, Lessons from Competitions, and Public Data looks at the Data Mining/Data Science/Big Data evolution, reviews lessons from KDD Cup 1997, Netflix Prize, and Kaggle, presents a big list of Public and Government data APIs, Marketplaces, Portals, and Platforms, and examines Big Data Hype. This talk was given at BPDM-2013, (Broadening Participation in Data Mining), Aug 10, 2013 held at KDD-2013, Chicago.
Nicholas Jewell MedicReS World Congress 2014MedicReS
Teaching Medical Research Methodology : All modern medical and public health research now requires a considerable amount of biostatistics,
computer science, data processing and machine
learning: Data Science
What every product manager needs to know about data science (ProductCamp Bost...ProductCamp Boston
Product management and data science work in close partnership at many of the highest performing organizations, which use analytics to develop better products, drive customer acquisition and retention, and create additional revenue streams. Not all organizations have the resources or desire to build an internal data science competency, but all can benefit from some degree of analytical orientation. When armed with basic quantitative literacy, product managers are often among the best positioned individuals in their organizations to recognize and build strong business cases around data related opportunities worth pursuing.
This presentation covers, through practical case studies, the information that all product managers should know about data science. Attendees will leave better able to recognize data related opportunities, understand the potential benefits and risks, build business cases for analytics, and learn more.
About Trevor Bass
Trevor Bass is a data scientist with a decade of experience building highly successful and innovative products and teams. He runs Bitten Labs, a data science management consultancy, education provider, and innovation lab. Prior to Bitten Labs, Trevor founded and built the data science function at payment processor Litle & Co (acquired by Vantiv), which performed product R&D, drove customer acquisition, retention, and upselling, provided quantitative consulting throughout the company and for its end customers, and established clear industry thought leadership. Trevor holds a Master's degree from Rutgers University and a Bachelor's degree magna cum laude from Harvard University, both in mathematics.
Why is big data all the rage? What is this "data science" that people are talking about? Why do I care — as a customer, and as someone who works at a company generating data? In this talk, I present the case for models, and how we can use data science to create and use models of our customers and the society around us.
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
Presentation from WUSS 2015:
“Data scientist” is often used as a blanket title to describe jobs that are drastically different. There are plenty of articles and discussions on the web about what data science is, what qualities define a data scientist, how to nurture them, and how you should position yourself to be a competitive applicant. There are far fewer resources out there about the steps to take in order to obtain the skills necessary to practice this elusive discipline. This presentation will explore a collection of freely accessible materials and content to jumpstart your understanding of the theory and tools of Data Science. We will also discuss some of the variable understandings that companies use to define the roles of their Data Scientists.
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
13. Earliest use of “data mining”: 1962
(c) KDnuggets 2016 14
Source: Google Books
After eliminating many “following data. Mining cost is ” examples
which refer to Mining of minerals,
and books from “1958” that have a CD attached (errors in book year)
The earliest “data mining” reference I found is
37. The best data scientists have one
thing in common –
unbelievable curiosity
DJ Patil, US First Chief Data Scientist
http://www.sciencefriday.com/articles/10-questions-for-the-
nations-first-chief-data-scientist
April 2016
38
50. Lesson 8: Limits to Predicting Human
Behavior?
• Inherent randomness, complexity in human
behavior
• Individual predictions have limited accuracy
(but can still be better than random and very
useful for consumer analytics)
• Aggregate predictions (eg who will win the
election) more accurate, because individual
randomness cancels out
(c) KDnuggets 2016 51
52. Direct Marketing Lift:
Random and Model-sorted Lists
0
10
20
30
40
50
60
70
80
90
100
5
15
25
35
45
55
65
75
85
95
Random
Model
5% of random list have 5% of hits
5% of model-score ranked list have 21% of hits.
Lift(5%) = 21%/5% = 4.2
Pct list
CPH:CumulativePctHits
53. Most lift curves are surprising similar-
limit to human predictability?
Study of lift curves in banking,
telecom
Best lift curves are similar
Special point T=Target
percentage
Lift(T) ~ sqrt (1/T)
G. Piatetsky-Shapiro, B. Masand,
Estimating Campaign Benefits and
Modeling Lift, in Proceedings of
KDD-99 Conference, ACM Press,
1999.
(c) KDnuggets 2016 54
0
2
4
6
8
10
12
14
0 5 10 15 20 25
100*T%
Lift
Actual lift(T) Est. lift(T)
87. Shortage of Data Scientists?
• McKinsey (2011): shortage by 2018 in US
– 140-190,000 people with deep analytical skills
– 1.5 M managers/analysts with the know-how to
use the analysis of big data to make effective
decisions.
Source:
www.mckinsey.com/mgi/publications/big_data/
92(c) KDnuggets 2016
88. Data Scientist –
Sexiest Job of the 21st Century?
• Thomas H. Davenport and D.J. Patil, (Harvard
Business Review, 2012)
93(c) KDnuggets 2016
95. Big Data
• Next Industrial Revolution
• Data Science is the Engine of Big Data
100(c) KDnuggets 2016
96. Doing Old Things Better
Application areas
– Direct marketing/Customer modeling
– Recommendations
– Fraud detection
– Security/Intelligence
– Healthcare
– …
• Competition will level companies
101(c) KDnuggets 2016
97. Big Data Enables New Things !
• Google – first big success of big data
• Social networks (Facebook, Twitter, LinkedIn,
…) success depends on network size, i.e. big
data
• Big Data in Health-care
– image analysis, diagnosis,
– Personalized medicine
• Recommendations - Netflix streaming
102(c) KDnuggets 2016
Churn: best algorithms for predicting churn have lift of 5-7 – 5-7 times better than random.
Behavioral advertising: 2-3% CTR – 10 times better than random
Future is Bright for Big Data, but need use caution when evaluating claims