This document provides an overview of how to become a data scientist. It discusses the soft skills and technical skills required, including learning statistics, data mining, machine learning, programming languages, visualization, and domain expertise. Key steps are to learn matrix factorizations, distributed computing, statistical analysis, optimization, information retrieval, algorithms, and data structures. Mastering these technical skills involves taking online courses and practicing with tools and data.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Una breve introduzione alla data science e al machine learning con un'enfasi sugli scenari applicativi, da quelli tradizionali a quelli più innovativi. La overview copre la definizione di base di data science, una overview del machine learning e esempi su scenari tradizionali, Recommender systems e Social Network Analysis, IoT e Deep Learning
Ordinary people included anyone who is not a Geek like myself. This book is written for ordinary people. That includes manager, marketers, technical writers, couch potatoes and so on.
Data Science and Analytics for Ordinary People is a collection of blogs I have written on LinkedIn over the past year. As I continue to perform big data analytics, I continue to discover, not only my weaknesses in communicating the information, but new insights into using the information obtained from analytics and communicating it. These are the kinds of things I blog about and are contained herein.
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?
This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.
Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success
These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.
From the webinar presentation "Data Science: Not Just for Big Data", hosted by Kalido and presented by:
David Smith, Data Scientist at Revolution Analytics, and
Gregory Piatetsky, Editor, KDnuggets
These are the slides for David Smith's portion of the presentation.
Watch the full webinar at:
http://www.kalido.com/data-science.htm
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Una breve introduzione alla data science e al machine learning con un'enfasi sugli scenari applicativi, da quelli tradizionali a quelli più innovativi. La overview copre la definizione di base di data science, una overview del machine learning e esempi su scenari tradizionali, Recommender systems e Social Network Analysis, IoT e Deep Learning
Ordinary people included anyone who is not a Geek like myself. This book is written for ordinary people. That includes manager, marketers, technical writers, couch potatoes and so on.
Data Science and Analytics for Ordinary People is a collection of blogs I have written on LinkedIn over the past year. As I continue to perform big data analytics, I continue to discover, not only my weaknesses in communicating the information, but new insights into using the information obtained from analytics and communicating it. These are the kinds of things I blog about and are contained herein.
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?
This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.
Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success
These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.
From the webinar presentation "Data Science: Not Just for Big Data", hosted by Kalido and presented by:
David Smith, Data Scientist at Revolution Analytics, and
Gregory Piatetsky, Editor, KDnuggets
These are the slides for David Smith's portion of the presentation.
Watch the full webinar at:
http://www.kalido.com/data-science.htm
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
Curious about Data Science? Self-taught on some aspects, but missing the big picture? Well, you’ve got to start somewhere and this session is the place to do it.
This session will cover, at a layman’s level, some of the basic concepts of Data Science. In a conversational format, we will discuss: What are the differences between Big Data and Data Science – and why aren’t they the same thing? What distinguishes descriptive, predictive, and prescriptive analytics? What purpose do predictive models serve in a practical context? What kinds of models are there and what do they tell us? What is the difference between supervised and unsupervised learning? What are some common pitfalls that turn good ideas into bad science?
During this session, attendees will learn the difference between k-nearest neighbor and k-means clustering, understand the reasons why we do normalize and don’t overfit, and grasp the meaning of No Free Lunch.
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Applications of Machine Learning at USC presentation by Alex Tellez
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
A presentation delivered by Mohammed Barakat on the 2nd Jordanian Continuous Improvement Open Day in Amman. The presentation is about Data Science and was delivered on 3rd October 2015.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Here's a starting template for anyone presenting data science topic to elementary school students. Exhibits how fun the field is and how the job market for these skills is excellent. Includes hyperlinks to various examples of interesting interactive visualizations.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this? In this session I will share insights and knowledge that I have gained from building up a Data Science department from scratch. The talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organization.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
This TDWI EU 2012 presentation looks at the various options for implementing a data store for analytical purposes and shows that there's no 'one size fits all' solution available
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
Curious about Data Science? Self-taught on some aspects, but missing the big picture? Well, you’ve got to start somewhere and this session is the place to do it.
This session will cover, at a layman’s level, some of the basic concepts of Data Science. In a conversational format, we will discuss: What are the differences between Big Data and Data Science – and why aren’t they the same thing? What distinguishes descriptive, predictive, and prescriptive analytics? What purpose do predictive models serve in a practical context? What kinds of models are there and what do they tell us? What is the difference between supervised and unsupervised learning? What are some common pitfalls that turn good ideas into bad science?
During this session, attendees will learn the difference between k-nearest neighbor and k-means clustering, understand the reasons why we do normalize and don’t overfit, and grasp the meaning of No Free Lunch.
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Applications of Machine Learning at USC presentation by Alex Tellez
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
A presentation delivered by Mohammed Barakat on the 2nd Jordanian Continuous Improvement Open Day in Amman. The presentation is about Data Science and was delivered on 3rd October 2015.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Here's a starting template for anyone presenting data science topic to elementary school students. Exhibits how fun the field is and how the job market for these skills is excellent. Includes hyperlinks to various examples of interesting interactive visualizations.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
Data Science has been described as the sexiest job of the 21st Century. But what is Data Science? And what has Machine Learning got to do with all this? In this session I will share insights and knowledge that I have gained from building up a Data Science department from scratch. The talk will be split into three sections:
1. I’ll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organization.
2. Next up we’ll run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
This TDWI EU 2012 presentation looks at the various options for implementing a data store for analytical purposes and shows that there's no 'one size fits all' solution available
Keys to understanding when you are looking for a Data Scientist vs. Engineer,...Domino Data Lab
Knowing how to hire in this market is tough, (and) understanding what you are really looking for is key. This Lightning talk will cover some of the challenges in our current market, (as well as) tips to make the hiring process easier. Presented by Mary Kypreos
Recruiting Manager for the Open Source & Big Data Team
Greythorn.
A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu
2016-10-04 Reactive Summit - Mesosphere Keynote
Enterprises hear about the promise of application containers, but realizing meaningful business results from containers requires more than abandoning virtual machines. In order to implement containers correctly, businesses must consider the operational implications, as well as the new types of applications they want to build using microservices. In this session, Ed Hsu, Vice President of Enterprise DC/OS at Mesosphere, discusses how to capitalize on new opportunities that can accelerate your IT modernization initiatives.
We hear a lot about lambda architectures and how Cassandra and Spark can help us crunch our data both in batch and real-time. After a year in the trenches, I'll share how we at The Weather Company built a general purpose, weather-scale event processing pipeline to make sense of billions of events each day. If you want to avoid much of the pain learning how to get it right, this talk is for you.
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
This webinar is targeted to Federal Government CIOs and
staff that are researching enterprise data management and
mining tools to help them understand how Smart Data Lakes
enable a viable mechanism for addressing their top priorities.
Watch this recorded webinar by Richard Mallah, Director of Advanced Analytics, to learn more about advancements in Text Analytics and how our Anzo Unstructured platform helps marry unstructured text with structured data from a wide variety of sources, allowing our customers to gain significant insights and competitive advantage by more easily and efficiently extracting meaning and value from the documents and the data.
This is a slide deck that was used for our 11/19/15 Nike Tech Talk to give a detailed overview of the SnappyData technology vision. The slides were presented by Jags Ramnarayan, Co-Founder & CTO of SnappyData
How can organizations give up the keys to data systems without creating data anarchy? The answer lies in Smart Data Lakes™. Learn how Smart Data Lakes are being used to design contextual data platforms for deeper insights and problem solving, responsibly and effectively introduce self-service independence from IT, put subject matter expertise to work overcoming volume and variety challenges and enable a backbone of collaboration and sharing to improve data and insights.
Always On: Building Highly Available Applications on CassandraRobbie Strickland
Cassandra was built from the ground up to enable linearly scalable, always-on applications. But the path to high availability has many land mines that can mean failure for the inexperienced user. In this talk, I will offer practical advice on how to achieve 100% uptime on millions of transactions per second. I'll address all aspects of the topic, including deployment, configuration, application design, and operations.
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
Slides from my talk with Evan Chan at Strata San Jose: NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis. Streaming analytics architecture in big data for fast streaming, ad hoc and batch, with Kafka, Spark Streaming, Akka, Mesos, Cassandra and FiloDB. Simplifying to a unified architecture.
To date, Hadoop usage has focused primarily on offline analysis--making sense of web logs, parsing through loads of unstructured data in HDFS, etc. But what if you want to run map/reduce against your live data set without affecting online performance? Combining Hadoop with Cassandra's multi-datacenter replication capabilities makes this possible. If you're interested in getting value from your data without the hassle and latency of first moving it into Hadoop, this talk is for you. I'll show you how to connect all the parts, enabling you to write map/reduce jobs or run Pig queries against your live data. As a bonus I'll cover writing map/reduce in Scala, which is particularly well-suited for the task.
A data lake promises cheap storage and ubiquitous access for all of your enterprise data. However, most organizations are struggling to make sense of the data in the lake. How do you harmonize, add meaning, govern, secure and offer business self-service to your data lake? You build a Smart Data Lake.
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
In this webinar, Barry Zane, our Vice President of Engineering, discusses the evolution of databases from Relational to Semantic Graph and the Anzo Graph Query Engine, the key element of scale in the Anzo Smart Data Lake. Based on elastic clustered, in-memory computing, the Anzo Graph Query Engine offers interactive ad hoc query and analytics on datasets with billions of triples. With this powerful layer over their data, end users can effect powerful analytic workflows in a self-service manner.
New Developments in Machine Learning - Prof. Dr. Max WellingTextkernel
Presentation from Prof. Dr. Max Welling, Professor of Machine Learning at the University of Amsterdam, at Textkernel's Intelligent Machines and the Future of Recruitment on June 2nd in Amsterdam.
At the end of this slide deck, you can also find the YouTube recording.
Due to increased compute power and large amounts of available data, machine learning is flourishing once again. In particular a technology called deep learning is making great strides maturing into a powerful technology. Max Welling briefly discusses variants of deep learning, such as convolutional neural networks and recurrent neural networks. But what lies around the corner in machine learning? He will discuss the three developments that in his opinion will become increasingly important:
1) Learning to interact with the world through reinforcement learning,
2) Learning while respecting everyone's privacy, and
3) Learning the causal relations in data (as opposed to discovering mere correlations).
Together, they represent the "power tools" of the future machine learner.
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
In his abstract, Scriffignano summarizes as follows:
l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions. Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise. At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways. Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best. A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available. Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.” Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!). This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves). Ultimately, what we learn is governed as much by the data available as by the questions we ask. This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.
From the MarTech Conference in London, UK, October 20-21, 2015. SESSION: The Human Side of Analytics. PRESENTATION: The Human Side of Data - Given by Colin Strong - @colinstrong - Managing Director - Verve, Author of Humanizing Big Data. #MarTech DAY2
Unlocking the Potential: Data as a Medium for Design & JusticeJess Freaner
As a designer and data scientist, I work with data in service of meeting people’s needs. This data is inherently subjective, which is what makes it both an excellent medium for design and vulnerable to misuse. I’ll share what it means to design with data and how data science can contribute to and augment the design process. Once we see what’s exciting and newly possible, we’ll delve into why now, more than ever, human-centered design matters as we discuss ethics and the impact of AI designs on individuals, communities, and societies.
Talk given by Jess Freaner (IDEO) at UX Strategy Meetup in Chicago - November 2019
It's all a game: The twin fallacies of epistemic purity and the scholarly inv...Carl Bergstrom
My talk from the Feb 2016 Gaming Metrics workshop at UC Davis.
Video of the talk at http://bit.ly/1WwfMxY (begins 20:00)
For some reason the visual gradients (used to indicate gradients of behavior) have not rendered here.
In this deck from the HPC User Forum in Tucson, Steve Conway from Hyperion Research presents: The Need for Deep Learning Transparency.
"We humans don’t fully understand how humans think. When it comes to deep learning, humans also don’t understand yet how computers think. That’s a big problem when we’re entrusting our lives to self-driving vehicles or to computers that diagnose serious diseases, or to computers installed to protect national security. We need to find a way to make these “black box” computers transparent."
"We help IT professionals, business executives, and the investment community make fact-based decisions on technology purchases and business strategy. Our industry experts are the former IDC high performance computing (HPC) analyst team, which remains intact and continues all of its global activities. The group is comprised of the world’s most respected HPC industry analysts who have worked together for more than 25 years."
Watch the video: https://wp.me/p3RLHQ-it7
Learn more: http://hyperionresearch.com/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
My talk at PyData NYC, 2018.
This is the abstract:
Hugo Bowne-Anderson, data scientist and host of the DataFramed podcast, will give you a view into the thinking of 50 leading data scientists from around the world about the trends driving the data science revolution. During his interviews with these thought leaders, Hugo discovered themes and lessons about the past, present, and future of data science.
AI should be Fair, Accountable and Transparent (FAT* AI), hence it's crucial to raise awareness among these topics not only among machine learning practitioners but among the entire population, as ML systems can take life-changing decisions and influence our lives now more than ever.
A Theory of Knowledge Lecture given by Mark Steed, Director of JESS Dubai on Monday 4th March 2019
The lecture explains how AI works and then looks at some of the ethical implications
Keynote Analytics Week, Boston, MA November 7, 2014
Big Data is in its infancy and is opening the door to profound change - Grand Opportunities (Accelerating Scientific Discovery) and Grand Challenges to be addressed over the next decade. We explore the premise that Data Science is to data-intensive discovery as the Scientific Method is to scientific discovery, leading us to potential Laws and Limits of Data Science, and then to Best Practices.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
4. ...and the 5 R's of 21st Century Literacy
⇨Reading
⇨wRiting
⇨aRithmetic
⇨pRobability
⇨R
Source: Joe BlitzStein, Harvard
5. "data scientists should take a page
from social scientists, who have a
long history of asking where the
data they're working with comes
from, what methods were used to
gather and analyze it, and what
cognitive biases they might bring to
its interpretation."
Kate Crawford, Microsoft Research/MIT
11. Data Scientists have more fun
Source: How to Engage and Retain Analytical Talent
By Elizabeth Craig, Jeanne G. Harris and Henry Egan
January 2010
12. How Do I Become A Data Scientist?
⇨ Learn about matrix factorizations
⇨ Learn about distributed computing
⇨ Learn about statistical analysis
⇨ Learn about optimization
⇨ Learn about machine learning
⇨ Learn about information retrieval
⇨ Learn about signal detection and estimation
⇨ Master algorithms and data structures
⇨ Practice
⇨ Study Engineering
Source: http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
13. 6 levels of expertise needed
Data wranglingStatistics
Data mining Visualization
Communication
Data
Science*
Domain & Business Expertise
* a bit of programming
skills doesn't hurt either
30. Want to get your feet wet?
Tableau Public
http://www.tableausoftware.com/public/
SAS Visual Analytics
http://www.sas.com/software/visual-analytics
31. Where to go from here?
⇨ Read 'Competing on Analytics'
⇨ Move on to 'Data Analysis Using SQL and Excel'
⇨ Then buy 'Handbook of Statistical Analysis & Data Mining
Applications'
⇨ Statistics for business:
⇨
http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm
⇨ Data Mining:
⇨ www.rapid-i.com (RapidMiner)
⇨
http://www.thearling.com
⇨ http://www.autonlab.org/tutorials/
⇨ For free text books, search www.scribd.com
⇨ Enter http://www.coursera.org
32. More Resources to Get You Started
Books:
⇨ DataMiningTechniques:ForMarketing,SalesandCustomerSupport,MichaelJ.BarryandGordonLinoff
⇨
DataPreparationforDataMining,DorianPyle
⇨ DataMiningAlgorithms,ElbeFrank,IanWitten,JimGray
⇨
AnIntroductiontoInformationRetrieval,ChristopherD.Manning,PrabhakarRaghavan,HinrichSchütze
⇨ InformationRetrieval,C.J.vanRijsbergen
⇨
TheVisualDisplayofQuantitativeInformation,EdwardR.Tufte
Journals,Newsletters,WebSites:
⇨
SIGKDDExplorations,NewsletteroftheACMSIGonKnowledgeDiscoveryandDataMining
⇨ IEEETransactionsonPatternAnalysisandMachineIntelligence
⇨
SASKnowledgeExchange: www.sas.com/knowledge-exchange/business-analytics
⇨ KDNuggetsdataminingresources: www.kdnuggets.com
⇨
FlowingData,visualizationresources: http://flowingdata.com/
⇨ Infoaesthetics,visualdesignresources: http://infosthetics.com/
⇨
VisualComplexity,visualizationresources: www.visualcomplexity.com/vc/index.cfm
⇨ Recommendationsystemsresources:
http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/tabid/1229/Default.aspx
⇨
TheImpoverishedSocialScientist'sGuidetoFreeStatisticalSoftwareandResources: http://maltman.hmdc.harvard.edu/socsci.shtml
33. Free Stuff So You Can Work Cheaply
⇨
WEKA http://www.cs.waikato.ac.nz/ml/weka/
⇨ IND decision tree software http://opensource.arc.nasa.gov/software/ind/
⇨
Clustering http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/
⇨ Parallel Sets http://eagereyes.org/parallel-sets#download
⇨
RapidMiner http://rapid-i.com/content/blogcategory/38/69/
⇨ Knime http://www.knime.org/
⇨ Orange http://www.ailab.si/Orange/
⇨
R statistics software http://www.r-project.org/
⇨ ARC statistics software http://www.stat.umn.edu/arc/software.html
⇨
Octave numerical and matrix computation http://www.gnu.org/software/octave/
⇨ Processing http://www.processing.org/
⇨
Circos http://mkweb.bcgsc.ca/circos/
⇨
Treemap http://www.cs.umd.edu/hcil/treemap/
⇨ Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/
⇨ Dutch Students: SAS & SPSS Academic Licenses (e.g. SurfSpot.nl)
34.
35. Web: www.sas.com
Email: jos.vandongen<at>sas.com
Phone: +31-(0)6-10172008
Skype: tholis.jos
LinkedIn: jvdongen
Twitter: josvandongen
Delicious: jvdongen
Jos van Dongen
In BI since 1991
Principal Consultant @ SAS
Author/Speaker/Analyst