data science history / data science @ NYTchris wiggins
talk delivered 2015-07-29 at ICERM workshop on "mathematics in data science"
workshop: https://icerm.brown.edu/topical_workshops/tw15-6-mds/
references: http://bit.ly/icerm
"data: past, present, and future" day 1 lecture 2020-01-20chris wiggins
What should our future statisticians, senators, and CEOs know about the history and ethics of data? How might understanding that history provide tools and resources to future citizens navigating a future shaped by data empowered algorithms? We've developed a course that introduces students, without prerequisites, to a historical view of our present condition, in which data-empowered algorithms shape our personal, professional, and political realities. The course attempts to integrate critical data studies with functional engagement with data (in Python via Jupyter notebooks), and interleaves an applied view of ethics throughout. The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
data science history / data science @ NYTchris wiggins
talk delivered 2015-07-29 at ICERM workshop on "mathematics in data science"
workshop: https://icerm.brown.edu/topical_workshops/tw15-6-mds/
references: http://bit.ly/icerm
"data: past, present, and future" day 1 lecture 2020-01-20chris wiggins
What should our future statisticians, senators, and CEOs know about the history and ethics of data? How might understanding that history provide tools and resources to future citizens navigating a future shaped by data empowered algorithms? We've developed a course that introduces students, without prerequisites, to a historical view of our present condition, in which data-empowered algorithms shape our personal, professional, and political realities. The course attempts to integrate critical data studies with functional engagement with data (in Python via Jupyter notebooks), and interleaves an applied view of ethics throughout. The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
Data! Action! Data journalism issues to watch in the next 10 yearsPaul Bradshaw
Keynote at the Nordic data journalism conference #NODA16 - an outline of issues facing data journalism which journalists and academics need to focus on in the next decade.
2018년9월27일 뉴욕에서 열린 <데이터 사이언스 살롱> 세미나 후기입니다. Viacom, BuzzFeed, Forbes, The NewYork Times등 뉴욕 기반의 미디어 업계관계자들이 데이터사이언스 프로젝트를 공유하는 자리였습니다. One More Thing으로 구글 마운틴뷰에서 본 Crayon Graph가 포함되어 있어요!
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
Inaugural lecture at Heinrich-Heine-University Düsseldorf on 28 May 2019.
Abstract:
When searching the Web for information, human knowledge and artificial intelligence are in constant interplay. On the one hand, human online interactions such as click streams, crowd-sourced knowledge graphs, semi-structured web markup or distributional semantic models built from billions of Web documents are informing machine learning and information retrieval models, for instance, as part of the Google search engine. On the other hand, the very same search engines help users in finding relevant documents, facts, or data for particular information needs, thereby helping users to gain knowledge. This talk will give an overview of recent work in both of the aforementioned areas. This includes 1) research on mining structured knowledge graphs of factual knowledge, claims and opinions from heterogeneous Web documents as well as 2) recent work in the field of interactive information retrieval, where supervised models are trained to predict the knowledge (gain) of users during Web search sessions in order to personalise rankings. Both streams of research are converging as part of online platforms and applications to facilitate access to data(sets), information and knowledge.
data science @NYT ; inaugural Data Science Initiative Lecturechris wiggins
inaugural Data Science Initiative Lecture @ Brown University
2015-12-04
https://www.eventbrite.com/e/data-science-at-the-new-york-times-tickets-19490272931
Roger hoerl say award presentation 2013Roger Hoerl
Presentation given by Roger Hoerl when he received the Statistical Advocate of the Year Award from the Chicago Section of the American Statistical Association (ASA), May 9th, 2013.
Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?
This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.
Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success
These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
Developments in Education for Information: Will ‘Data’ Trigger the Next Wave ...Yasar Tonta
?” Paper presented at the International Conference on Information Management and Libraries (ICIML), November 10-13, University of the Punjab, Lahore, Pakistan.
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
Data! Action! Data journalism issues to watch in the next 10 yearsPaul Bradshaw
Keynote at the Nordic data journalism conference #NODA16 - an outline of issues facing data journalism which journalists and academics need to focus on in the next decade.
2018년9월27일 뉴욕에서 열린 <데이터 사이언스 살롱> 세미나 후기입니다. Viacom, BuzzFeed, Forbes, The NewYork Times등 뉴욕 기반의 미디어 업계관계자들이 데이터사이언스 프로젝트를 공유하는 자리였습니다. One More Thing으로 구글 마운틴뷰에서 본 Crayon Graph가 포함되어 있어요!
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
Inaugural lecture at Heinrich-Heine-University Düsseldorf on 28 May 2019.
Abstract:
When searching the Web for information, human knowledge and artificial intelligence are in constant interplay. On the one hand, human online interactions such as click streams, crowd-sourced knowledge graphs, semi-structured web markup or distributional semantic models built from billions of Web documents are informing machine learning and information retrieval models, for instance, as part of the Google search engine. On the other hand, the very same search engines help users in finding relevant documents, facts, or data for particular information needs, thereby helping users to gain knowledge. This talk will give an overview of recent work in both of the aforementioned areas. This includes 1) research on mining structured knowledge graphs of factual knowledge, claims and opinions from heterogeneous Web documents as well as 2) recent work in the field of interactive information retrieval, where supervised models are trained to predict the knowledge (gain) of users during Web search sessions in order to personalise rankings. Both streams of research are converging as part of online platforms and applications to facilitate access to data(sets), information and knowledge.
data science @NYT ; inaugural Data Science Initiative Lecturechris wiggins
inaugural Data Science Initiative Lecture @ Brown University
2015-12-04
https://www.eventbrite.com/e/data-science-at-the-new-york-times-tickets-19490272931
Roger hoerl say award presentation 2013Roger Hoerl
Presentation given by Roger Hoerl when he received the Statistical Advocate of the Year Award from the Chicago Section of the American Statistical Association (ASA), May 9th, 2013.
Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?
This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.
Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success
These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
Developments in Education for Information: Will ‘Data’ Trigger the Next Wave ...Yasar Tonta
?” Paper presented at the International Conference on Information Management and Libraries (ICIML), November 10-13, University of the Punjab, Lahore, Pakistan.
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
Abstract: The emerging configuration of educational institutions, technologies, scientific practices, ethics policies and companies can be usefully framed as the emergence of a new “knowledge infrastructure” (Paul Edwards). The idea that we may be transitioning into significantly new ways of knowing – about learning and learners, teaching and teachers – is both exciting and daunting, because new knowledge infrastructures redefine roles and redistribute power, raising many important questions. What should we see when open the black box powering analytics? How do we empower all stakeholders to engage in the design process? Since digital infrastructure fades quickly into the background, how can researchers, educators and learners engage with it mindfully? This isn’t just interesting to ponder academically: your school or university will be buying products that are being designed now. Or perhaps educational institutions should take control, building and sharing their own open source tools? How are universities accelerating the transition from analytics innovation to infrastructure? Speaking from the perspective of leading an institutional innovation centre in learning analytics, I hope that our experiences designing code, competencies and culture for learning analytics sheds helpful light on these questions.
What Data Can Do: A Typology of Mechanisms
Angèle Christin .
International Journal of Communication > Vol 14 (2020) , de Angèle Christin del Departamento de Comunicación de Stanford University, USA titulado "What Data Can Do: A Typology of Mechanisms". Entre otras cosas es autora del libro "Metrics at Work.
한국언론학회 2016년 봄철학술대회의 <테마논문> 세션
이번 학술대회의 테마는 <미래>이며,
이 세션에서는 테마에 관한 초청논문이 발표되고 토론될 예정입니다.
학술대회의 여러 행사 중 가장 중요한 세션이라고 할 수 있지요~^^
4부(15:50~17:30)에 100분간 진행되며, 장소는 이화여대 ECC B225호입니다(날짜: 5월 21일 토).
100분동안 3편의 논문이 발표되며, 각 논문 당 한 분이 토론에 참여하십니다.
Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma
These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.
a mission-driven approach to personalizing the customer journeychris wiggins
Keynote talk at PyData NYC 2019 by Anne Bauer, Lead Data Scientist, The New York Times, and Chris Wiggins, Chief Data Scientist, The New York Times.
"Data science at The New York Times: a mission-driven approach to personalizing the customer journey"
How does The New York Times use data science to further its mission?
We'll talk about the use of machine learning throughout the company,
from social media promotion to targeted advertising to content
recommendations, and the cross-team collaborations that make it
possible.
Data Science at The New York Times: what industry can learn from us; what we ...chris wiggins
Keynote talk at RSG with DREAM 2019 | November 4-6, 2019 | New York, USA | HOME - RECOMB/ISCB RSG 2019; 12th annual RECOMB/ISCB Conference on Regulatory & Systems Genomics .
pecial Session on Cancer Systems Biology
Regulatory and Systems Genomics 2019 will include an abstract submissions track for a Special Session of Cancer Systems We welcome submissions on computational and experimental advances in the systems-level modeling of cancer. Topics include but are not limited to: regulatory programs and signaling pathways in cancer cells, tumor-immune interactions and the tumor microenvironment, developmental plasticity in tumors and epigenetic analyses, tumor metabolism, genetic and non-genetic sources of heterogeneity, drug response and precision oncology. The session will include presentations from keynote speakers as well as talks from selected abstracts. This special session is sponsored by the Research Center for Cancer Systems Immunology at Memorial Sloan Kettering Cancer Center, an NCI-funded Cancer Systems Biology Consortium (CSBC) Center.
slides uploaded by request
talk presented at the MIDAS seminar, University of Michigan, 2019-04-15. Video available via https://www.youtube.com/watch?v=c7t4LMkq_SU . For more information: https://midas.umich.edu/event/chris-wiggins/
abstract
The Data Science group at The New York Times develops and deploys
machine learning solutions to newsroom and business problems.
Re-framing real-world questions as machine learning tasks requires not
only adapting and extending models and algorithms to new or special
cases but also sufficient breadth to know the right method for the
right challenge. I'll first outline how unsupervised, supervised, and
reinforcement learning methods are increasingly used in human
applications for description, prediction, and prescription,
respectively. I'll then focus on the 'prescriptive' cases, showing how
methods from the reinforcement learning and causal inference
literatures can be of direct impact in engineering, business, and
decision-making more generally.
Talk delivered 2019-06-25 as part of the Summer Institute in Computational Social Science, held at Princeton University https://compsocialscience.github.io/summer-institute/2019/
Video: https://www.youtube.com/watch?v=0suLWheVji0
title:
what should future statisticians, CEO, and senators know about the
history and ethics of data?
abstract:
What should our future statisticians, senators, and CEOs know about the history and ethics of data?
How might understanding that history provide tools and resources to future citizens navigating a future shaped by data empowered algorithms?
I'll present content from a class co-developed over the past several years with Professor Matt Jones of Columbia's Department of History, based on material absent from both the curriculum for future technologists as well as for future humanists.
The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...chris wiggins
Data-empowered algorithms are reshaping our professional, personal, and political realities.
However, existing curricula are predominantly designed either for future technologists, focusing on functional capabilities; or for future humanists, focusing on critical and rhetorical context surrounding data.
"Data: Past, Present, and Future" is a new course at Columbia which seeks to define a curriculum at present taught to neither group, yet of interest and utility to future statisticians, CEOs, and senators alike.
The intellectual arc traces from the 18th century to present day, beginning with examples of contemporary technological advances, disquieting ethical debates, and financial success powered by panoptic persuasion architectures.
The weekly cadence of the course pairs primary and secondary readings with Jupyter notebooks in Python, engaging directly with the data and intellectual advances under study.
Throughout, these intellectual technical advances are paired with critical inquiry into the forces which encouraged and benefited from these new capabilities, i.e., the political dimension of data and technology.
Syllabus, Jupyter notebooks, and additional info can be found via https://data-ppf.github.io/
"Data: Past, Present, and Future" is supported by the Columbia University Collaboratory Fellows Fund. Jointly founded by Columbia University’s Data Science Institute and Columbia Entrepreneurship, The Collaboratory@Columbia is a university-wide program dedicated to supporting collaborative curricula innovations designed to ensure that all Columbia University students receive the education and training that they need to succeed in today’s data rich world.
Data: Past, Present, and Future (Lecture 1, Spring 2018)chris wiggins
Slides from Lecture 1 of "Data: Past, Present, and Future",
Jan 17 2018.
New class on how data is impacting our professional, political, and personal realities. Taught by Profs Matt Jones and Chris Wiggins
lean + design thinking in building data productschris wiggins
talk given to "Columbia startup teams including 2016 CVC winners, Ignition Grant winners, TFP fellows, and ASCENT fellows" as part of a mini-bootcamp for founders at columbia's engineering school 2016-05-24
presentation for the NYCRIN / NSF meeting 2013-07-24,
showing 1st demo of leanworkbench.com,
open source tools to quantify and drive early stage startups,
with support from NSF award 1305023
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
22. not always obvious, cf. Hoteling 1945
“Data Analysis… what can be learned from 50 years”
Huber, 2012
23. data science: 2 definitions
1. in academia: machine learning
applied, science/research focus
2. in industry: machine learning
applied, product/software focus
24. data science @ The New York Timeshistory: 2 epochs
1. slow burn @Bell: as heretical
statistics (see also Breiman)
2. caught fire 2009-now: as job
description
25. data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins
27. "...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’
28. "...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’ - J Chambers, Bell Labs,1993, “GLS”
35. data skills
data science and…
- data analytics
- data engineering
- data embeds
- data product
- data multiliteracies
cf. “data scientists at work”, ch 1
36. data science and…
- data analytics
- data engineering
- data embeds
- data product
- data multiliteracies
cf. “data scientists at work”, ch 1
data science and…
- data analytics
- data engineering
- data embeds
- data product
- data multiliteracies
data skills
nota bene!
these are 3 separate skill sets;
academia often conflates the 3.
In industry these are*
- 3 related functions
- 3 collaborating teams
* or at least @ NYT, 2016