Talk given to Harrisonburg High School Students about Data and Data Science
Note: some slides had animations in Excel, so unfortunately, the images overlap on the SlideShare version.
Introduction to Data Science Talk Given to Girl Develop It! Central VA members
Note: some slides had animations in Excel, so unfortunately, the images overlap on the SlideShare version.
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Paul Bradshaw
This document provides an overview of web scraping, including why it is useful for journalists, tools that can be used, and tips for identifying opportunities for scraping different types of data sources. Scraping can be used to automate collection of repetitive data, gather information from multiple pages or documents, and provide long-term insight into issues by monitoring changes over time. The document discusses factors like patterns in web pages, URL structures, and data formats that indicate whether a site may be suitable for scraping and provides some examples of sites that have been or could be scraped.
This document provides a collection of links to various data visualization websites and resources, including examples of maps, charts, and infographics related to topics like Facebook friendships, bike usage in San Francisco, comparisons of scholarly databases, radiation levels, food stamp usage, and more. Basic design principles for data visualization like color, composition, alignment and repetition are discussed. The document encourages exploring these resources and designing an infographic on candy related data to present to the group.
Machine Intelligence - Wie Systeme lernen und unseren Alltag verändernMark Cieliebak
Mark Cieliebak gives a presentation on machine intelligence and how systems learn and change everyday life. He discusses smart speakers, autonomous robots, AlphaGo beating a Go champion, drones, and machine learning algorithms. Cieliebak also covers neural networks, unsupervised learning, reinforcement learning, and applications of machine learning like sentiment analysis, predicting medical side effects, and measuring hand hygiene effectiveness. He concludes that incredible progress has been made in recent years, especially with deep learning since 2010.
In September 2019 I spoke at the SciCAR conference in Dortmund, Germany about the experiences of working in a country with well developed open data policies — and the dangers it presents for data journalists.
In this session we'll dive into the journey that Google chooses to take in order focus on AI: what was the mindset, what were the challenges and what is the direction for the future.
Introduction to Data Science Talk Given to Girl Develop It! Central VA members
Note: some slides had animations in Excel, so unfortunately, the images overlap on the SlideShare version.
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Paul Bradshaw
This document provides an overview of web scraping, including why it is useful for journalists, tools that can be used, and tips for identifying opportunities for scraping different types of data sources. Scraping can be used to automate collection of repetitive data, gather information from multiple pages or documents, and provide long-term insight into issues by monitoring changes over time. The document discusses factors like patterns in web pages, URL structures, and data formats that indicate whether a site may be suitable for scraping and provides some examples of sites that have been or could be scraped.
This document provides a collection of links to various data visualization websites and resources, including examples of maps, charts, and infographics related to topics like Facebook friendships, bike usage in San Francisco, comparisons of scholarly databases, radiation levels, food stamp usage, and more. Basic design principles for data visualization like color, composition, alignment and repetition are discussed. The document encourages exploring these resources and designing an infographic on candy related data to present to the group.
Machine Intelligence - Wie Systeme lernen und unseren Alltag verändernMark Cieliebak
Mark Cieliebak gives a presentation on machine intelligence and how systems learn and change everyday life. He discusses smart speakers, autonomous robots, AlphaGo beating a Go champion, drones, and machine learning algorithms. Cieliebak also covers neural networks, unsupervised learning, reinforcement learning, and applications of machine learning like sentiment analysis, predicting medical side effects, and measuring hand hygiene effectiveness. He concludes that incredible progress has been made in recent years, especially with deep learning since 2010.
In September 2019 I spoke at the SciCAR conference in Dortmund, Germany about the experiences of working in a country with well developed open data policies — and the dangers it presents for data journalists.
In this session we'll dive into the journey that Google chooses to take in order focus on AI: what was the mindset, what were the challenges and what is the direction for the future.
Semantics, Deep Learning, and the Transformation of BusinessSteve Omohundro
Deep learning is likely to have a big impact on business. McKinsey predicts that AI and robotics will create $50 trillion of value over the next 10 years. Over $1 billion of venture investment has gone to 250 deep learning startups over the past year. Deep learning systems have recently broken records in speech recognition, image recognition, image captioning, translation, drug discovery and other tasks. Why is this happening now and how is it likely to play out? We review the development of AI and the pendulum swings between the "neats" and the "scruffies". We describe traditional approaches to semantics through logics and grammars and the new deep learning vector semantics. We relate it to Roger Shepard's cognitive geometry and the structure of biological networks. We also describe limitations of deep learning for safety and regulation. We show how it fits into the rational agent framework and discuss what the next steps may be.
VLAB Talk: AI, Deep Learning, and the Future of BusinessSteve Omohundro
AI and robotics are poised to create $50 trillion of value in the next 10 years. This is causing hundreds of startups to be created with billions of dollars of investment. Over 250 of these are based on "deep learning neural networks". This talk explores the impact of AI and the recent successes of deep learning.
AI & The Future of Work - Work & Life in the Age of RobotsVolker Hirsch
The slides to my keynote at the annual conference for the Association of Business Psychology (ABP), held in London on 14 Oct 2016. It's a "shock & awe" take on what's coming and why we need to be alert to those changes.
Getting started with infographics for studentsLorena Swetnam
This document provides steps for creating an infographic. It begins by defining what infographics are and providing examples of different types. It then lists tools that can be used to create infographics like Google Presentations. The main part of the document outlines a 6 step process for making an infographic: 1) exploring types, 2) researching and collecting data, 3) sketching a draft, 4) gathering images and citations, 5) selecting colors and fonts, 6) creating the infographic and adding citations. Each step includes guidance, images for examples, and tips. It concludes with a bibliography of cited sources.
10 ways AI can be used for investigationsPaul Bradshaw
The document discusses various ways that artificial intelligence can be used to assist with journalistic investigations and reporting. It provides examples of AI being used to find patterns in large datasets, analyze text and images, generate automated summaries, and more. However, it also notes challenges like ensuring accuracy of AI systems and the need for quality control of algorithmic outputs.
Exosphere Chile Talk: Semantics, Deep Learning, and the Transformation of Bus...Steve Omohundro
McKinsey predicts that AI and robotics will create $50 trillion of value over the next 10 years. Many predict that the recent technology of “deep learning” will be a big part of the transformation. Over 250 deep learning startup companies have attracted more than $1 billion of venture investment in the past year. Deep learning systems have recently broken records in speech recognition, image recognition, image captioning, translation, drug discovery and other tasks. Why is this happening now and how is it likely to play out? We review the development of AI and the pendulum swings between the “neats” and the “scruffies”. We describe traditional approaches to semantics through logics and grammars and the new deep learning vector semantics. We relate it to Roger Shepard’s cognitive geometry and the structure of biological networks. We also describe limitations of deep learning for safety and regulation. We show how it fits into the rational agent framework and discuss what the next steps may be.
This document provides guidance on creating infographics. It defines infographics and lists different types including statistical, timeline, process, and gameboard infographics. It discusses standards alignment and includes examples of student work. Tools for creating infographics like Google Presentation and Piktochart are presented. A six step process for creating infographics is outlined: 1) explore types, 2) research and collect data, 3) sketch draft, 4) gather images, 5) pick colors and fonts, 6) create and cite. Specific guidance is given for each step, including researching sources, citing images, using appropriate colors and fonts, and ensuring citations are included.
Data Journalism (City Online Journalism wk8)Paul Bradshaw
The document provides an overview of data journalism including what it is, sources for finding data, and tools for analyzing and visualizing data. It discusses scraping data from websites, using tools like Google searches, spreadsheets, and APIs to extract structured data. Ethical considerations around scraping are also mentioned. The document concludes with assigning students to group blogs and individual strategies focusing on different aspects of online journalism.
Mid-Tennessee Region Future Technology PresentationJason Griffey
The document discusses emerging technologies and their potential impacts. It describes concepts like Moore's Law, wearable computing, sensors, autonomous robots, and possible futures for libraries including serving as privacy spaces, data hubs, archives, and activists. The presentation argues we should embrace change and act like we already live in a science fiction future with crazy near technologies that will change everything.
Mashups & Data Visualizations: The New Breed of Web ApplicationsDarlene Fichter
Web 2.0 is opening the doors to tools and toolkits for do-it-yourself (DIY) programming that requires no knowledge about programming. Find out what mashups are and how libraries are making use of them to create rich, new information services and content. Look at some of the intriguing and robust new data visualization tools, such as IBM’s alphworks, swivel, gapminder (bought by Google), etc. that can put the power of spreadsheets online for everyone in your organization to present their information as tag clouds, bar and pie charts, bubble maps, and more.
Presented by Darlene Fichter October 31, 2007 at Internet Librarian 2007
The document discusses open data from Fingal County Council's perspective. It provides details on Fingal's open data portal including the 170 datasets across 12 categories and apps created through an open data app competition. It also discusses Dublin region's open data network, examples of data reuse, and steps for government agencies to publish open data including assigning responsibility, releasing data without restrictions, and engaging communities.
Presentation shared by author at the 9th EDEN Research Workshop "Forging new pathways of research and innovation in open and distance learning: Reaching from the roots" held on 4-6 October 2016, in Oldenburg, Germany.
Find out more on #EDENRW9 here: http://www.eden-online.org/2016_oldenburg/
Computer matching involves comparing data from multiple sources to identify matches. The FBI developed computer matching in 1976 to search employee records in Chicago and find cases of fraud. The FBI and government health agencies were responsible for managing early computer matching programs, which involved techniques such as passport and photo matching. Future matching may involve scanning body features for identification.
The document discusses data, data science, and finding data sources. It defines data as raw facts about the world and notes that data comes from various sources like government, scientific research, citizens, and private companies. It then discusses the growth of digital data and issues around open data. The document defines data science as using analysis methods to describe facts, detect patterns, and test hypotheses. Finally, it provides tips on finding needed data, such as searching open data sources, APIs, scraping, and joining datasets.
Harnessing Human Semantics at Scale (updated)Lora Aroyo
The document appears to be a series of tweets and posts by Lora Aroyo discussing data science and crowdsourcing techniques. Some key points discussed include harnessing human semantics at scale through crowdsourcing and nichesourcing, measuring quality and reproducibility of crowdsourced results, and experimenting with different task designs and payment models to assess their impact. Specific examples mentioned include using crowdsourcing to add detailed annotations to museum collections and to find "blindspots" in AI models through a data challenge.
This document discusses various topics related to web visualization including concepts, methods, examples and technologies. It provides links to examples of radio visualization, maps, timelines, platforms and visualization APIs. It also discusses concepts in information design, workflows for data visualization, and user experience models.
Doug Caruso, assistant metro editor at The Columbus Dispatch, prepared this presentation on producing data-driven enterprise stories off your beat for Columbus, Ohio, NewsTrain on Oct. 21, 2017. It is accompanied by a handout of the same title. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
How news organizations are using data to tellpeterverweij
This document discusses the history and techniques of data journalism. It notes that data journalism has been practiced for over 50 years under different names like precision journalism and analytical journalism. The common goal is to use scientific tools to improve reporting quality. Examples are given of data-driven stories by news organizations on topics like riots, traffic accidents, school performance, and crime. The document recommends that journalists learn basic data skills like Excel, data scraping, mapping, and visualization. It also advises starting small with available data and hiring specialists for advanced data analysis. The key is for journalists to use numbers and data alongside text to tell compelling stories.
The document discusses the growing problem of "affluenza" in America, where consumerism and the pursuit of luxury and material goods have become epidemic. It suggests that simplicity is being replaced by excess and that people are no longer content to live within their means. As a result, many Americans are deep in debt and are sacrificing important values like health, relationships, and experiences for empty consumption and the prioritization of possessions over life experiences.
20210623 Digital Technologies and Innovations in EducationRamesh C. Sharma
Digital technology and innovation are rapidly changing the world. [1] Artificial intelligence, machine learning, and big data are fueling the fourth industrial revolution. [2] New technologies like augmented and virtual reality are enhancing learning. [3] Learning analytics tools are providing insights to improve education. The future of learning will be highly personalized and lifelong, enabled by technologies like AI assistants, adaptive learning apps, and blockchain-backed credentials.
Semantics, Deep Learning, and the Transformation of BusinessSteve Omohundro
Deep learning is likely to have a big impact on business. McKinsey predicts that AI and robotics will create $50 trillion of value over the next 10 years. Over $1 billion of venture investment has gone to 250 deep learning startups over the past year. Deep learning systems have recently broken records in speech recognition, image recognition, image captioning, translation, drug discovery and other tasks. Why is this happening now and how is it likely to play out? We review the development of AI and the pendulum swings between the "neats" and the "scruffies". We describe traditional approaches to semantics through logics and grammars and the new deep learning vector semantics. We relate it to Roger Shepard's cognitive geometry and the structure of biological networks. We also describe limitations of deep learning for safety and regulation. We show how it fits into the rational agent framework and discuss what the next steps may be.
VLAB Talk: AI, Deep Learning, and the Future of BusinessSteve Omohundro
AI and robotics are poised to create $50 trillion of value in the next 10 years. This is causing hundreds of startups to be created with billions of dollars of investment. Over 250 of these are based on "deep learning neural networks". This talk explores the impact of AI and the recent successes of deep learning.
AI & The Future of Work - Work & Life in the Age of RobotsVolker Hirsch
The slides to my keynote at the annual conference for the Association of Business Psychology (ABP), held in London on 14 Oct 2016. It's a "shock & awe" take on what's coming and why we need to be alert to those changes.
Getting started with infographics for studentsLorena Swetnam
This document provides steps for creating an infographic. It begins by defining what infographics are and providing examples of different types. It then lists tools that can be used to create infographics like Google Presentations. The main part of the document outlines a 6 step process for making an infographic: 1) exploring types, 2) researching and collecting data, 3) sketching a draft, 4) gathering images and citations, 5) selecting colors and fonts, 6) creating the infographic and adding citations. Each step includes guidance, images for examples, and tips. It concludes with a bibliography of cited sources.
10 ways AI can be used for investigationsPaul Bradshaw
The document discusses various ways that artificial intelligence can be used to assist with journalistic investigations and reporting. It provides examples of AI being used to find patterns in large datasets, analyze text and images, generate automated summaries, and more. However, it also notes challenges like ensuring accuracy of AI systems and the need for quality control of algorithmic outputs.
Exosphere Chile Talk: Semantics, Deep Learning, and the Transformation of Bus...Steve Omohundro
McKinsey predicts that AI and robotics will create $50 trillion of value over the next 10 years. Many predict that the recent technology of “deep learning” will be a big part of the transformation. Over 250 deep learning startup companies have attracted more than $1 billion of venture investment in the past year. Deep learning systems have recently broken records in speech recognition, image recognition, image captioning, translation, drug discovery and other tasks. Why is this happening now and how is it likely to play out? We review the development of AI and the pendulum swings between the “neats” and the “scruffies”. We describe traditional approaches to semantics through logics and grammars and the new deep learning vector semantics. We relate it to Roger Shepard’s cognitive geometry and the structure of biological networks. We also describe limitations of deep learning for safety and regulation. We show how it fits into the rational agent framework and discuss what the next steps may be.
This document provides guidance on creating infographics. It defines infographics and lists different types including statistical, timeline, process, and gameboard infographics. It discusses standards alignment and includes examples of student work. Tools for creating infographics like Google Presentation and Piktochart are presented. A six step process for creating infographics is outlined: 1) explore types, 2) research and collect data, 3) sketch draft, 4) gather images, 5) pick colors and fonts, 6) create and cite. Specific guidance is given for each step, including researching sources, citing images, using appropriate colors and fonts, and ensuring citations are included.
Data Journalism (City Online Journalism wk8)Paul Bradshaw
The document provides an overview of data journalism including what it is, sources for finding data, and tools for analyzing and visualizing data. It discusses scraping data from websites, using tools like Google searches, spreadsheets, and APIs to extract structured data. Ethical considerations around scraping are also mentioned. The document concludes with assigning students to group blogs and individual strategies focusing on different aspects of online journalism.
Mid-Tennessee Region Future Technology PresentationJason Griffey
The document discusses emerging technologies and their potential impacts. It describes concepts like Moore's Law, wearable computing, sensors, autonomous robots, and possible futures for libraries including serving as privacy spaces, data hubs, archives, and activists. The presentation argues we should embrace change and act like we already live in a science fiction future with crazy near technologies that will change everything.
Mashups & Data Visualizations: The New Breed of Web ApplicationsDarlene Fichter
Web 2.0 is opening the doors to tools and toolkits for do-it-yourself (DIY) programming that requires no knowledge about programming. Find out what mashups are and how libraries are making use of them to create rich, new information services and content. Look at some of the intriguing and robust new data visualization tools, such as IBM’s alphworks, swivel, gapminder (bought by Google), etc. that can put the power of spreadsheets online for everyone in your organization to present their information as tag clouds, bar and pie charts, bubble maps, and more.
Presented by Darlene Fichter October 31, 2007 at Internet Librarian 2007
The document discusses open data from Fingal County Council's perspective. It provides details on Fingal's open data portal including the 170 datasets across 12 categories and apps created through an open data app competition. It also discusses Dublin region's open data network, examples of data reuse, and steps for government agencies to publish open data including assigning responsibility, releasing data without restrictions, and engaging communities.
Presentation shared by author at the 9th EDEN Research Workshop "Forging new pathways of research and innovation in open and distance learning: Reaching from the roots" held on 4-6 October 2016, in Oldenburg, Germany.
Find out more on #EDENRW9 here: http://www.eden-online.org/2016_oldenburg/
Computer matching involves comparing data from multiple sources to identify matches. The FBI developed computer matching in 1976 to search employee records in Chicago and find cases of fraud. The FBI and government health agencies were responsible for managing early computer matching programs, which involved techniques such as passport and photo matching. Future matching may involve scanning body features for identification.
The document discusses data, data science, and finding data sources. It defines data as raw facts about the world and notes that data comes from various sources like government, scientific research, citizens, and private companies. It then discusses the growth of digital data and issues around open data. The document defines data science as using analysis methods to describe facts, detect patterns, and test hypotheses. Finally, it provides tips on finding needed data, such as searching open data sources, APIs, scraping, and joining datasets.
Harnessing Human Semantics at Scale (updated)Lora Aroyo
The document appears to be a series of tweets and posts by Lora Aroyo discussing data science and crowdsourcing techniques. Some key points discussed include harnessing human semantics at scale through crowdsourcing and nichesourcing, measuring quality and reproducibility of crowdsourced results, and experimenting with different task designs and payment models to assess their impact. Specific examples mentioned include using crowdsourcing to add detailed annotations to museum collections and to find "blindspots" in AI models through a data challenge.
This document discusses various topics related to web visualization including concepts, methods, examples and technologies. It provides links to examples of radio visualization, maps, timelines, platforms and visualization APIs. It also discusses concepts in information design, workflows for data visualization, and user experience models.
Doug Caruso, assistant metro editor at The Columbus Dispatch, prepared this presentation on producing data-driven enterprise stories off your beat for Columbus, Ohio, NewsTrain on Oct. 21, 2017. It is accompanied by a handout of the same title. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
How news organizations are using data to tellpeterverweij
This document discusses the history and techniques of data journalism. It notes that data journalism has been practiced for over 50 years under different names like precision journalism and analytical journalism. The common goal is to use scientific tools to improve reporting quality. Examples are given of data-driven stories by news organizations on topics like riots, traffic accidents, school performance, and crime. The document recommends that journalists learn basic data skills like Excel, data scraping, mapping, and visualization. It also advises starting small with available data and hiring specialists for advanced data analysis. The key is for journalists to use numbers and data alongside text to tell compelling stories.
The document discusses the growing problem of "affluenza" in America, where consumerism and the pursuit of luxury and material goods have become epidemic. It suggests that simplicity is being replaced by excess and that people are no longer content to live within their means. As a result, many Americans are deep in debt and are sacrificing important values like health, relationships, and experiences for empty consumption and the prioritization of possessions over life experiences.
20210623 Digital Technologies and Innovations in EducationRamesh C. Sharma
Digital technology and innovation are rapidly changing the world. [1] Artificial intelligence, machine learning, and big data are fueling the fourth industrial revolution. [2] New technologies like augmented and virtual reality are enhancing learning. [3] Learning analytics tools are providing insights to improve education. The future of learning will be highly personalized and lifelong, enabled by technologies like AI assistants, adaptive learning apps, and blockchain-backed credentials.
Making friends with big data resource linksHeather Stark
This document provides a list of over 30 links to resources about big data, analytics, social games, and related topics. The links cover free reports on big data, digital advertising, how Google and Facebook operate, NoSQL databases, cloud computing, data-driven design, social network structure, analytics in games, A/B testing, and more. The document aims to share useful information and perspectives for understanding big data and its applications.
HackerEarth is pleased to announce its next session to help you understand what it really takes to become a data scientist.
Agenda of this session will include answers to the following questions:
- Why is it the best time to take up Data Science as a career?
- How can you take the first step in Data Science? (After all, first step is always the hardest!)
- How can you become better and progress fast?
- How is life after becoming a Data Scientist?
Speaker:
Jesse Steinweg-Woods is soon-to-be a Senior Data Scientist at tronc, working on recommender systems for articles and understanding customer behavior. Previously, he worked at Argo Group Insurance on new pricing models that took advantage of machine learning techniques. He received his PhD in Atmospheric Science from Texas A&M University, and his research focused on numerical weather and climate prediction.
Biomedical Data Science: We Are Not AlonePhilip Bourne
This document discusses biomedical data science and the opportunities and challenges presented by new developments in data science. Some key points:
- We are at a tipping point where biomedical research is no longer the sole leader in data science due to advances in many other fields. Biomedical researchers need to become data scientists to stay relevant.
- Data science is being driven by the massive growth of digital data and requires an interdisciplinary approach. It is touching every field and attracting many students.
- Developing effective data systems and infrastructure is a major challenge to enable open sharing and analysis of data. Initiatives are underway but more collaboration is needed across sectors.
- Advances in machine learning, like Alpha
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
This document discusses the opportunities and challenges of big data for libraries, researchers, and digital humanities. It notes that big data is growing exponentially from sensors, internet data, and scientific instruments. Libraries and librarians have new roles to play in data management, curation, and research data services. Researchers need help with data literacy, data management plans, and archiving research data. Digital humanities can use big data and visualization to gain new insights. Standards like TEI and services like data repositories are important to enable access and reuse of data.
An internal talk for VisuaDNA about Smart Cities, Open-Data, Connected Objects, Quantified Self, Internet of Things, and other ideas for the future of Big Data.
Presented on December 2013.
Great quotes from historical leading thinkers (Einstein) to contemporary (Tim O’Reilly & DJ Patil) have to say about the power, use and analysis of data.
The skills implications of Cognitive ComputingDale Lane
1) Cognitive computing systems learn from interactions with people and data to perform tasks like thinking and making complex decisions with large amounts of information.
2) These systems will change how people interact with technology and require new skills like designing systems that can learn to solve problems and access necessary data.
3) Cognitive computing will be highly disruptive and enable new combinations of human and computer intelligence beyond what either could do alone.
This document discusses various tools and techniques for visualizing data, including websites like FlowingData.com that provide examples of data visualizations. It lists many different skills that are useful for data visualization, such as design, computer science, statistics, and teaching. Throughout the document are examples of visualizations and infographics on topics like food stamps, genetics, and radiation levels.
1. Christian Heilmann discusses key steps for building a successful web product: having a creative idea, finding people to build it, and getting discovered by people.
2. He argues that instead of focusing on these steps, one should "shift your focus" to using existing web technologies to build and promote your product.
3. Some strategies he recommends include using APIs to allow flexible development and access to data, leveraging Yahoo Query Language (YQL) as a way to test your product before fully developing infrastructure, and releasing free tools and content to promote discovery by developers.
Text taken off http://ict4peace.org/?p=3106
ICT4Peace’s Sanjana Hattotuwa has been invited to give a lecture on 27 March 2014 at the International Security Network (ISN) of ETH Zurich on “Big Data, ICTs and New Media in Times of Crisis“.
The International Relations and Security Network (ISN) is one of the world’s leading open access information services for both professionals and students who focus on international relations (IR) and security studies.
Established in 1994, its mission is to facilitate security-related dialogue and cooperation within a trusted network of international relations organizations, professionals and experts, and to provide open-source international relations and security-related tools and materials in accessible ways.The ISN is a project of the Center for Security Studies (CSS), at the Swiss Federal Institute of Technology (ETH Zurich).
This document discusses how open data about UK MPs expenses was analyzed and visualized in different ways using tools like Google Spreadsheets, Yahoo Pipes, and Flickr. Various techniques are described for enriching, distributing, and making the data more fluid and interactive, including geocoding, linking to other data sources, and creating visual dashboard feeds. The overall message is that with freely available tools, anyone can start analyzing and rewiring open government data.
GAN for business value @ Data Science MilanAlex Honchar
This document discusses generative modeling beyond using GANs to generate creepy images on Twitter. It provides examples of using generative modeling for tasks like generating video game levels, text, molecules, and music. Generative modeling involves modeling a distribution to generate new data points, while discriminative modeling involves classifying data points. Generative modeling has value in data augmentation, anonymization, and translating data from one domain to another.
Strata Conference NYC 2013 Full VersionTaewook Eom
The document provides an overview of topics discussed at the Strata Conference in 2013, including keynotes, sessions, and speakers. It discusses big data technologies like Hadoop, NoSQL, data science, and real-time stream processing. Some highlights include discussions on defining data science roles, predicting human vs machine performance, organizing data-centric companies, and the future of Hadoop.
Future Technological Practices: Medical Librarians’ Skills and Information Structures for Continued Effectiveness in a Changing Environment
Patricia F. Anderson, Skye Bickett, AHIP, Joanne Doucette, Pamela R. Herring, AHIP, Judith Kammerer, AHIP, Andrea Kepsel, AHIP, Tierney Lyons, Scott McLachlan, Ingrid Tonnison, and Lin Wu, AHIP
Cetis13 Analytics and Institutional Capabilities - IntroMartin Hawksey
The document discusses analytics and institutional capabilities. It outlines a presentation on learning analytics that includes lightning talks, a discussion on data access and roles, and an exercise where participants choose images related to dreams, nightmares, or fairy dust visions of the future of learning analytics. Participants are asked to add stories or comments to the images they tag.
Similar to "What is Data Science?" High School Version (20)
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
6. Stored in Databases on Servers in Data Centers
http://pixabay.com/static/uploads/photo/2014/03/13/01/12/datacen
ter-286386_640.jpg
https://c2.staticflickr.com/2/1296/533233247_b6baa30fdb_z.jpg?zz=1
“The Cloud”
13. Data Scientist
Computer
Scientist
• Gathering data
• Writing Code
• Designing Interfaces
• Design / Manage /
Query Databases
• Data Mining
Mathematician
• Statistics
• Predictive Analytics
• Data Visualizations
• Evaluating Results
Business Person
• Domain Expertise
• Knowing what
questions to ask
• Interpreting results
for business
decisions
• Presenting outcomes
No one person needs to have all of these skills.
More organizations are now building data science teams.
16. Statistician
Data Mining Specialist
Biostatistician
Social Science Researcher
Big Data Analyst
Spatial/GIS Analyst
Natural Language
Processing Researcher
Computational Physicist
Some other names for “Data Scientist”
Pythonista
Financial Analyst
Recommendation System
Engineer
Information Architect
Artificial Intelligence
Researcher
Neuroscientist
Data Visualization Designer
17. Data Science jobs pay an
average of $118,000 per year
It is estimated that by 2018, US could have a
shortage of 140,000+ people with advanced
analytical skills & need 1.5M managers/analysts
that can make decisions based on data analysis
18. Examples
Galaxy Classification from Images
http://benanne.github.io/2014/04/05/galaxy-zoo.html
Choosing Audience for Content Promotion
on Facebook
http://citizennet.com/blog/2012/11/10/random-forests-
ensembles-and-performance-metrics/
Predicting Seizures
https://www.kaggle.com/c/seizure-detection
March Madness Picks
https://www.kaggle.com/c/march-machine-
learning-mania-2015
22. It’s actually really hard for computers to do things humans
consider simple! We train them using “machine learning”
https://www.linkedin.com/pulse/machine-learning-image-detectioncats-vs-dogs-amrith-kumar
23. "Once you start working in robotics, you
realize that things that kids learn to do
up to age 10 ... are actually the hardest
things to get a robot to do.“
-Pieter Abbeel, AI Researcher and Professor at UC Berkeley
25. Programming
Any language is good
to start with! Just
start coding!
Most common:
Python or R
Database design, SQL
Math
Statistics
Linear Algebra
Different places you can start if you’re interested in data science
Research and Analysis
Science involving data
collection and
interpretation
Working with “messy” real
life data
Business Analytics
Data Mining
Others
Business / Communication
Graphic Design
26. Doing Data Science by Cathy O’Neil* & Rachel Schutt
Data Smart by John Foreman* (uses Excel)
Blogs & News Feeds (FlowingData.com is a good one to start with)
Podcasts
Twitter – look for curated lists of people to follow
https://twitter.com/BecomingDataSci/lists/women-in-data-science/members
Online courses like those on DataCamp, Codecademy, Coursera
TED talks on Data http://www.ted.com/search?q=data
Practice w/public data sets on sites like data.gov
Volunteer opportunities via DataKind
Ask me…. I have plenty more!
Learning Resources
27. Find me online:
@becomingdatasci
“Data Science Renee”on Twitter
BecomingADataScientist.com
DataSciGuide.com
Becoming a Data Scientist Podcast &
Learning Club
28. Questions?
Or want a copy of these slides and links?
Renée Teate
renee@becomingadatascientist.com
@becomingdatasci on Twitter
Editor's Notes
Created
Collected
Type of Data we’re talking about is digital, stored in computers
Radio telescope
All of that has to be stored somewhere, and organized for access and analysis
But let’s not get ahead of ourselves… back to the “data being stored and related” part
Social Media – posts, friends/follows, likes/favorites, location-tagged images
Note: often other people generating this data about you (tags, mentions, etc.)
Online Shopping – “other customers who purchased this also purchased….”, even just browsing the website, clicking, spending time on a page – usually all of that data is tracked.
Ever noticed when you leave an online store, the items you looked at “follow” you around the internet via ads?
Travel – purchase tickets, check in, post on social media, rental car with GPS, hotel rooms, credit card at restaurant, generating data everywhere you go
-credit card fraud alerts when in new location
Cell phones constantly generating data – app usage, location, websites, alarms, games, photos, etc.
Now that I’ve gotten you thinking about data, specifically YOUR data, let’s think about some ways in which having your data collected (and aggregated) can help you:
-Navigation (Google Maps directions)
-Recommendations (Yelp, Netflix)
-Medical Diagnoses
-Alerts
How are these generated? ALGORITHMS
Downside
-Some sites now charging different customers different prices based on browsing history http://www.fastcoexist.com/3037888/where-and-how-youre-online-shopping-changes-the-prices-you-see?utm_source=facebook
-Any data could be hacked (such as health or financial records) and lead to loss of privacy. The more places it’s stored, the more vulnerable it is.
Who writes these algorithms?
-Experts in Machine Learning – Computer Scientists – Data Scientists!
They’re often using statistical models. Who develops those?
-Mathematicians – Statisticians – Data Scientists!
Why do they write them?
-Sometimes altruistic or experimental, but usually to make someone money!
Who is using these results to make money?
-Business People – Marketers – Data Scientists!
Note: you don’t have to be the expert in all of these areas
Bulk of data science jobs appear to be in financial (credit cards) and marketing (ad serving) realm, however, that seems to be changing now that every company seems to be looking into whether they should have a data scientist on staff. Pick some areas you’re interested in, and search the internet for people in that area in data jobs.
Also, there are now organizations like DataKind for data scientists and analysts to volunteer their time and skills to help solve problems in arenas outside their “day job” field, such as non-profits and cities.
Note – in the last one they did a pilot study, and the extra care cut readmission rates in half
Did I have all of these when I graduated? No – had basic Stats & Calculus, basic VB programming, Database Design, ISAT projects
But this is what I would have taken more of had I known about data science then.
If you’re already well-versed in area, either get more advanced, or get more breadth (recommended). If you’re a math major, take a science research course. If you’re a CS major, take a business course. Etc.
Didn’t have these great online courses when I was in school.
Here’s a blog post by Trey Causey with good info on getting started: http://treycausey.com/getting_started.html
Interview with Jawbone Data Scientist Abe Gong about using Data Science to solve human problems: http://www.datascienceweekly.org/data-scientist-interviews/using-data-science-solve-human-problems-abe-gong-interview