What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
Data Science is a wonderful technology that has applications in almost every field. Let's learn the basics of this domain on 16th March at (time).
Agenda
1. What is Data Science? How is it different from ML, DL, and AI
2. Why is this skill in demand?
3. What are some popular applications of Data Science
4. Popular tools and frameworks used in Data Science
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
Data Science is a wonderful technology that has applications in almost every field. Let's learn the basics of this domain on 16th March at (time).
Agenda
1. What is Data Science? How is it different from ML, DL, and AI
2. Why is this skill in demand?
3. What are some popular applications of Data Science
4. Popular tools and frameworks used in Data Science
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
In this PPT on Data Science Tutorial, you’ll get an in-depth understanding of Data Science and you’ll also learn how it is used in the real world to solve data-driven problems. It’ll cover the following topics in this session:
Need for Data Science
Walmart Use case
What is Data Science?
Who is a Data Scientist?
Data Science – Skill set
Data Science Job roles
Data Life cycle
Introduction to Machine Learning
K- Means Use case
K- Means Algorithm
Hands-On
Data Science certification
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Data science is different from Data Analytics,Data Engineering,Big Data.
Presentation about Data Science.
What is Data Science its process future and scope.
Data Science Presentation By Amit Singh.
"Sexiest job of 21st century"
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
In this presentation, I have talked about Big Data and its importance in brief. I have included the very basics of Data Science and its importance in the present day, through a case study. You can also get an idea about who a data scientist is and what all tasks he performs. A few applications of data science have been illustrated in the end.
Data Science is the Sexiest job in 21st century. Big Data Concept is going to rule the 21st century. Here is the presentation to give complete information and overview of data science big data.
Being able to make data driven decisions is a crucial skill for any company. The requirements are growing tougher - the volume of collected data keeps increasing in orders of magnitude and the insights must be smarter and faster. Come learn more about why data science is important and what challenges the data teams need to face.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
This presentation explains what data engineering is and describes the data lifecycles phases briefly. I used this presentation during my work as an on-demand instructor at Nooreed.com
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
In this PPT on Data Science Tutorial, you’ll get an in-depth understanding of Data Science and you’ll also learn how it is used in the real world to solve data-driven problems. It’ll cover the following topics in this session:
Need for Data Science
Walmart Use case
What is Data Science?
Who is a Data Scientist?
Data Science – Skill set
Data Science Job roles
Data Life cycle
Introduction to Machine Learning
K- Means Use case
K- Means Algorithm
Hands-On
Data Science certification
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Data science is different from Data Analytics,Data Engineering,Big Data.
Presentation about Data Science.
What is Data Science its process future and scope.
Data Science Presentation By Amit Singh.
"Sexiest job of 21st century"
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
In this presentation, I have talked about Big Data and its importance in brief. I have included the very basics of Data Science and its importance in the present day, through a case study. You can also get an idea about who a data scientist is and what all tasks he performs. A few applications of data science have been illustrated in the end.
Data Science is the Sexiest job in 21st century. Big Data Concept is going to rule the 21st century. Here is the presentation to give complete information and overview of data science big data.
Being able to make data driven decisions is a crucial skill for any company. The requirements are growing tougher - the volume of collected data keeps increasing in orders of magnitude and the insights must be smarter and faster. Come learn more about why data science is important and what challenges the data teams need to face.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
This presentation explains what data engineering is and describes the data lifecycles phases briefly. I used this presentation during my work as an on-demand instructor at Nooreed.com
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
Introduction to Mahout and Machine LearningVarad Meru
This presentation gives an introduction to Apache Mahout and Machine Learning. It presents some of the important Machine Learning algorithms implemented in Mahout. Machine Learning is a vast subject; this presentation is only a introductory guide to Mahout and does not go into lower-level implementation details.
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
Some people think data scientists are mythical beings, like unicorns, or they are some sort of nouveau fad that will quickly fade. Not true, says IBM big data evangelist James Kobielus. In this engaging presentation, with artwork created by Angela Tuminello, Kobielus debunks 10 myths about data scientists and their role in analytics and big data. You might also want to read the full blog by Kobielus that spawned this presentation: "Data Scientists: Myths and Mathemagical Superpowers" - http://ibm.co/PqF7Jn
For more information, visit http://www.ibmbigdatahub.com
Tutorial on Deep learning and ApplicationsNhatHai Phan
In this presentation, I would like to review basis techniques, models, and applications in deep learning. Hope you find the slides are interesting. Further information about my research can be found at "https://sites.google.com/site/ihaiphan/."
NhatHai Phan
CIS Department,
University of Oregon, Eugene, OR
What is the impact of Big Data on Analytics from a Data Science perspective.
Presented at the Big Data and Analytics Summit 2014, Nasscom by Mamatha Upadhyaya.
Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?
This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.
Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success
These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.
You've heard the news, Data Science is the cool new career opportunity sweeping the world. Come learn from Thinkful Mentors all about this new and exciting industry.
Today, we have data – lots of it. We can process information – in many ways. And with these two tools and a little bit of creativity, we are discovering the vast depths of human behavior and by extension, a way to accurately predict the future -- and our future happiness. In fact, we can quantify human movement, behaviors, desires, and even moods on a scale that wasn’t possible before a series of advances in processing power, developments in psychology and social network science, and most importantly, access to data.
In advertising, industry, and humanity, we have experienced the evolution from Web 1.0 (informational) to Web 2.0 (platform) to Web 3.0 (semantic) to elements of Web 4.0 (anticipatory) – In this anticipatory era, what can we dream of next? Beyond addressability and increasing ad relevance, how can businesses utilize these advances in product development and other market initiatives? Can we make the leap from inductive logic to human-paralleled intuition? Can this make up for our human brain mechanics that make predicting our own happiness so difficult?
In this talk we’ll cover the evolutions in data access, models for information processing, and the science of collaboration to see not only how they have been leveraged in businesses but also how they are used to better understand human behavior, and hopefully in the near future, a little bit of happiness.
IIPGH Webinar 1: Getting Started With Data Scienceds4good
In this webinar for ICT Professionals Ghana, we explore the concepts of data science and its motivations as a recent specialization. creating the background for how Artificial Intelligence relates to Machine Learning and to Deep Learning. We further discuss the data science technology stack and the opportunities that exist in the space.
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
“Big Data” is a term that’s come from nowhere in the last 5 years or so, and is now practically ubiquitous within IT. But is it useful or even meaningful? Doesn’t it put too much emphasis on size over content or value? Does it add anything to discussions at all? Or does it actually impede communication, by obscuring crucial differences between diverse kinds of data that all require different tools, algorithms and strategies?
(Talk presented at "Big Data for the Public Sector and Business Enterprise", London 2013)
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
Big Data vs. Small Data...what's the difference?Anna Kuhn
What is big data? A 3-pg summary of the key differences between "big data" and "small data."
Includes comparison of data jargon, high level technologies, staffing / people, and the nature of the data itself.
Perfect for data-savvy marketers & agencies, and beginner-to-intermediate data and analytics professionals.
How to Feed a Data Hungry Organization – by Traveloka Data TeamTraveloka
In Traveloka's Inaugural Data Meetup held in April 2017, Ainun Najib (Head of Data), Dr. Philip Thomas (Lead Data Scientist), and Rendy B. Junior (Lead Data Engineer) shared about the journey that Traveloka's Data Team have taken so far so that the audience can learn from the struggles and triumphs in managing Traveloka's burgeoning data.
You will learn more about:
1) Data culture in Traveloka
2) Data engineering in Traveloka
3) Data science in Traveloka
To follow our LinkedIn page, visit bit.ly/TravelokaLinkedInPage
Safe Harbor Statement
Our discussion may include predictions, estimates or other information that might be considered conclusive. While these conclusive statements represent our current judgment on the best practices, they are subject to risks and uncertainties that could cause actual results to differ materially. You are cautioned not to place undue reliance on our statements, which reflect our opinions only as of the date of this presentation. Please keep in mind that we are not obligating ourselves to revise or publicly release the results of any revision to these presentation materials in light of new information or future events.
Yo. big data. understanding data science in the era of big data.Natalino Busa
We talk a lot these days about data science, and how it will pave our paths with beautiful insights and unexpected new relations and connections in our given datasets, and even across datasets.
But how to maintain the "Science" part in "Data Science"? After some time working in this field I appreciate more and more the critical thinking which has characterized the progress in science.
Hypothesis, facts, prove and/or disprove the thesis. This is how science has progressed in the past centuries. This method has been formalized by Popper and categorize as non-science all disciplines where the statements cannot be falsified. In other words, if a statement cannot be disproved, we cannot talk of science, since there is no mechanism to left to verify the solution or to prove it wrong.
When that happens the argument can still be accepted, but not scientifically accepted. Ways of accepting or refuting a non falsifiable statement are for instance based on aesthetic, authority or pragmatic or philosophical considerations. All valid but not scientific. This applies for instance to statements in the disciplines of politics, teology, ethics, etc.
Science has definitely progressed since then. For instance, Bayesian networks and statistical inductions are currently part of the arsenal of the (data) scientist weapons. But, no matter how the baseline is set, critical thinking and a rigorous method are definitely helpful in understanding the results produced by science in particular when this is based on large amount of data and computational in nature, rather than formula/model driven.
Data Science has currently many different connotations. On one side it praises the "artistry", the genius of laying out connections between disciplines and concepts. This is a truly great aspect of scientists and creativity is definitely very welcome in all data science profiles.
With the fun of creating new insights and new data golden eggs, a data scientist has to put up with those annoying criteria of reproducibility, falsifiability and peer reviewing. Sometimes these elements are postponed or left behind in name of the artistry. Granted, it's just hard to find metrics and baselines in order to compare models and data science solutions. But the scientific method has proven to be solid over the centuries and has proven to allow factual scientific discussion between scientists and a to allow selection between models based on objective agreed criteria.
Similar to Big Data [sorry] & Data Science: What Does a Data Scientist Do? (20)
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Data Science London
Talk by Ian Andrews & Mike Goddard @Greenplum at Data Science London 28/11/2012. A financial services case on how to standardize merchant names with RegEx & fuzzy matching
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Accelerate your Kubernetes clusters with Varnish Caching
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
1. Big Data [sorry]
Data Science:
What Does a Data Scientist Do?
Carlos Somohano
Founder Data Science London
@ds_ldn
datasciencelondon.org
The Cloud and Big Data: HDInsight on Azure London 25/01/13
3. Man on the Moon – Small Data!
Computer Program
Apollo X1
Man on the Moon
Date: 1,969
Speed: 3,500 km/hour
Distance: 356,000 Km
64 Kb, 2Kb RAM, Fortran
Weight: 13,500 kg
Never been there before
Must work 1st time
Lots of complex data
Must return to Earth
4. Apollo XI, 1969
SkyDive Stratos, 2012
64 Kb
Tens of Gigabytes
Think About It – We live in Crazy Times!
6. What is Big Data? IT mumbo-jumbo
A fashionable term typically used by some IT
vendors to remarket old fashioned software
hardware
7. What is Big Data? The n-Vs
Volume …
Variety …
Velocity …
(add your own V here…)
So What?
8. Change! Water Cooler Chat
We need to parallelize data operations but it’s too costly complex …
The business can’t get access to all the relevant data, we need external data…
We can’t match customer master data to live customer interactions…
We can’t just force everything into a star-schema…
These BI reports and charts don’t tell us anything we didn’t know…
We are missing the ETL window, the data we needed didn’t arrive on time…
We can’t predict with confidence if we can’t explore data develop our own models
9. What is Big Data? Force of Change
Big Data forces you to change the way you collect,
store, manage, analyze and visualize data
11. Big Data = Crude Oil [not New Oil]
Think data as ‘crude oil.’
Big Data is about extracting the ‘crude oil,’
transporting it in ‘mega-tankers,’ siphoning it through
‘pipelines,’ and storing it in massive ‘silos’…
All ‘this’ is about IT Big Data… fine and well…
… BUT
12. You need to refine the ‘crude oil’
Enter Data Science…
13. The Science [and Art] of…
Discovering what we don’t know from data
Obtaining predictive, actionable insight from data
Creating Data Products that have business impact now
Communicating relevant business stories from data
Building confidence in decisions that drive business value
14. Brief History of Data Science
6th C BC - 1st C BC – The Greeks! Pyrrhonism, Skepticism Empiricism…
1974 – Peter Naur @UoC Datalogy Data Science
2001 – William S. Cleveland @CSU Data Science: An Action Plan …:
2002 – Committee on Data for Science Technology (CODATA)
2003 – Journal of Data Science
2009 – Jeff Hammerbacher @ Facebook What does a Data Scientist Do?
2010 – Drew Conway @NYU The Data Science Venn Diagram
2010 – Hillary Mason Chris Wiggins @Dataists “
2010 – Mike Loukadis @O’Reilly “What is Data Science?”
2011 – DJ Patil @LinkedIn data scientist vs. data analyst
15. Jeff Hammerbacher, 2009
“... on any given day, a team member could author a
multistage processing pipeline in Python,
design a hypothesis test, perform a regression analysis
over data samples with R,
design and implement an algorithm for some data-
intensive product or service in Hadoop, or
communicate the results of our analyses to other
members of the organization.
16. Mike Loukides, 2010
Data science enables the creation of data
products.
Whether... data is search terms, voice samples, or
product reviews,... users are in a feedback loop in
which they contribute to the products they use.
That's the beginning of data science.
17. Hilary Mason Chris Wiggins,2010
Data science is clearly a blend of the hackers’ arts, statistics
and machine learning...;
and the expertise in mathematics and the domain of the
data for the analysis to be interpretable...
It requires creative decisions and open-mindedness in a
scientific context.
19. DJ Patil, 2011
”We realized that as our organizations grew, we both had to figure out
what to call the people on our teams. Business analyst” and Data analyst”
seemed too limiting.
The focus of our teams was to work on data applications that would have
an immediate and massive impact on the business.
The term that seemed to fit best was data scientist: those who use both
data and science to create something new”
21. The Duck – Billed Platypus
The Data Scientist – Billed Platypus
22. The Platypus – Billed Data Scientist
Machine Learning
Hacking
Statistics
Math
Visualization
Science
Programming
Data Mining
The Data Scientist – Billed Platypus
24. Class DataScientist {
Is skeptical, curious. Has inquisitive mind
Knows Machine Learning, Statistics, Probability
Applies Scientific Method. Runs Experiments
Is good at Coding Hacking
Able to deal with IT Data Engineering
Knows how to build data products
Able to find answers to known unknowns
Tells relevant business stories from data
Has Domain Knowledge
}
26. 10 Things [most] Data Scientists Do
1 Ask Good Questions. What is What…
…we don’t know?
…we’d like to know?
2 Define and Test an Hypothesis. Run experiments
3 Scoop, Scrap, Sink, Sample Business Relevant Data
4 Munge and Wrestle Data. Tame Data
5 Explore Data, Discover Data Playfully. Discover unknowns.
6 Model Data. Model Algorithms.
7 Understand Data Relationships
8 Tell the Machine How to Learn from Data
9 Create Data Products that Deliver Actionable Insight
10 Tell Relevant Business Stories from Data
29. [Some] Data Science Principles
1 Socio-Technical Systems (STS) are complex!
2 Data is never at rest
3 Data is dirty, deal with it
4 SVoT = LOL!
5 Data munging data wrestling 70% time
6 Simplification. Reduction. Distillation
7 Curiosity. Empiricism. Skepticism
30. Knowns Unknowns
There are known knowns. These are things we know
that we know.
There are known unknowns. That is to say, there are
things that we know we don't know.
But there are also unknown unknowns. There are
things we don't know we don't know
Donald Rumsfeld
31. DIKUW FTW!
D I K U W
Data Information Knowledge Understanding Wisdom
PAST FUTURE
Data Engineer
Data Analyst
Data Miner
Data Scientist
Raw What How to Why When
Numbers Description Experience Cause Effect Prediction
Letters Context Tested Proven What’s best
Known Unknown
Symbols Relationship Instruction Unknowns
Unknowns
Known Knowns
Signals Reports Programs models
32. Data Discovery
Data Analyst
Data Scientist
The new reality for Business Intelligence and Big Data, Applied Data Labs
33. Data Models vs. Algorithmic Models
Data Modeling
VS.
Algorithmic Modeling
Y ß F( X, random noise, parameters)
Y ß
Black Box
ß X
Random Forests
We understand the world
We don’t understand the world
How well ‘my data model’ works
The world produces data in a black-box
Statisticians, Data Analysts, Data Miners
Data Scientists
Linear Regression
Machine Learning, AI Neural Nets
Logistic Regression
Random Forests, SVM, GBT
Known Distributions
Unknown Multivariate Distributions
Confidence Intervals
Iterative
Predictor Variables Goodness of Fit
Predictive Accuracy
“Statistical Modeling: The Two Cultures” Leo Breiman, 2001
34. Learning from Data is Tricky
Statistical vs. Machine Learning
Supervised vs. Unsupervised Learning
Induction vs. Deduction
Sampling Confidence Intervals
Probability Distribution
Deviation Variance
Correlation vs. Causation
Causation Prediction
35. More Data or Better Models?
More Data Beats Better Algorithms, Omar Tawakoi @BlueKai
Better Algorithms Beat More Data, Mark Torrance @RocketFuel
More Data or Better Models, Xavier Armitrain @Netflix
On Chomsky 2 Cultures of Statistical Learning, Peter Norvig @Google
Specialist Knowledge is Useless Unhelpful, Jeremy Howard @Kaggle
37. Data Science Process - 1
1 Known Unknowns?
2 We’d like to know…?
3 Outcomes?
4 What Data?
5 Hypothesis?
The World
Ingest Raw Data
Munch Data
The Dataset
Product Manufactured
Transactions
MapReduce
Independency?
Goods shipped
Web-Scraping
ETL, ELT
Correlation?
Product purchased
Web-clicks logs
Data Wrangle
Covariance?
Phone Calls Made
Sensor Data
Data Cleansing
Causality?
Energy Consumed
Mobile Data
Data Jujitsu
Dimensionality?
Fraud Committed
Docs, Emails, XLS
Dim Reduction
Missing Values?
Repair Requested
Social Feeds, RSS
Sample
Relevant?
System
Flume Sink HDFS
Select, Join, Bind
38. Data Science Process - II
The Dataset
Explore Data
Represent Data
Discover Data
Deliver Insight
Learn From Data
Data Product
Visualize Insight
Description Inference
Objectives
Data Algorithm Models
Levers
Actionable
Machine Learning
Modeling
Predictive
Networks Graphs
Simulation
Immediate Impact
Regression Prediction
Optimization
Business Value
Classification Clustering
Visualization
Easy to explain
Experiments Iteration
40. A Data Product Is…
… Curated and crafted from raw data
… A result of exploration and iterations
… A machine that learns from data
… An answer to known unknowns or unknown unknowns
… A mechanism that triggers immediate business value
… A probabilistic window of future events or behavior
41. Data Jiu-Jitsu
Data
Jiu Jitsu Fight
$$$$
Data Product
Data Scientist
Data Jiu-Jitsu: ability to turn big data into data products that generate immediate business value
(DJ Patil @LinkedIn)
42. Developing Data Products
Objectives
Levers
Data
Models
What Outcome What Inputs Can What Data Can How the Levers
Am I Trying to We Control?
We Collect?
Influence the
Achieve?
Objectives
Adapted from “Designing Great Data Products. The Drivetrain Approach: A Four Step Approach to Building Data Products”
Jeremy Howard, Margit Zwemer, Mike Loukides, 2012
43. Objective-Based Data Products
What Outcome Am I Actionable
Trying to Achieve?
Outcome
Data
Modeler
Simulator
Optimizer
The Model Assembly Line
Adapted from “Designing Great Data Products. The Drivetrain Approach: A Four Step Approach to Building Data Products”
Jeremy Howard, Margit Zwemer, Mike Loukides, 2012
45. Customer Lifecycle Value
Optimize CLV
Product Recommendations
Visualizer
Data
Modeler
Simulator
Optimizer
1 Products the customer may like
2 Price Elasticity
3 Probability of Purchase w/o Recommendation
4 Purchase Sequence
5 Causality Model
6 Patience Model
Adapted from “Designing Great Data Products. The Drivetrain Approach: A Four Step Approach to Building Data Products”
Jeremy Howard, Margit Zwemer, Mike Loukides, 2012
46. Automated Fruits Procurement
Confirm Purchase Orders
In less than 2 hours
Safety Stock levels?
Demand vs Stock?
Price vs. Demand?
12,000 stores
Anomalies?
300 Fruits
Fruit Shortages?
Avg. Shelf life 3 days
Fruit Write-offs?
Adapted from Blueyonder
47. Strawberries the Weather
No sales vs X,XXX sales predicted
Why these huge stock write-offs?
A Predictive Model that calculates
strawberry purchases based on
Weather forecast
Sudden increase in temperature
Store temperature
Freezer sensor data
Remaining stock per shelf live
Sales TPoS feeds
Web searches, social mentions
Adapted from Blueyonder
48. Personalized Social Recommendations
Collaborative Filtering: Matching Skills to People
Prediction: Personalized Skills Recommendation
Adapted from “Developing Data Products” by Peter Skomoroch 5 Dec, 2012 Copyright LinkedIn
49. Colas- In Which US State I Invest Mktg. $?
What the Business Analyst Sent
What the Data Scientist did…
50. The Great Pop vs. Soda Page
http://www.popvssoda.com/
53. Interested in Data Science?
Join our community
http://www.meetup.com/Data-Science-London/
Follow us on Twitter
@ds_ldn
Check out our blog
http://datasciencelondon.org