The elusive 'Data Scientist' is a word that pops up more and more. Is this a buzzword or is something really changing in the world? Piet Daas of the CBS will take us on a tour of the changes that he sees around him.
The elusive 'Data Scientist' is a word that pops up more and more. Is this a buzzword or is something really changing in the world? Piet Daas of the CBS will take us on a tour of the changes that he sees around him.
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...Mindtrek
Paulo Canas Rodrigues
Research Director
CAST (Centre for Applied Statistics and Data Analytics) University of Tampere
The role of Statistics in the Internet of Things
Mindtrek 2016
Andrea Pietracaprina - In this talk, we will overview some popular computing frameworks (e.g., MapReduce, Spark) which are widely used to unleash the computational potential of the cloud for big-data applications. For concreteness, we will describe efficient implementations of some key tools used in data analysis (e.g, clustering, diversity maximization).
The profile of the management (data) scientist: Potential scenarios and skill...Juan Mateos-Garcia
Big and Social Media data opens up new scenarios and opportunities for management research (such as using internal communication data to map knowledge networks inside firms, or using web data to study firm capabilities and strategies). This presentation, given at the British Academy of Management 2014 conference proposes a typology of such scenarios, describes the skills required to exploit them, and considers implications for the education and training of management researchers.
Data Mining is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. It is very important to understand the importance and need of data mining in todays situation.
An introduction to Data Mining by Kurt ThearlingPim Piepers
An Introduction to Data Mining Discovering hidden value in your data warehouse By Kurt Thearling Overview Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?" This white paper provides an introduction to the basic technologies of data mining. Examples of profitable applications illustrate its relevance to today’s business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...Mindtrek
Paulo Canas Rodrigues
Research Director
CAST (Centre for Applied Statistics and Data Analytics) University of Tampere
The role of Statistics in the Internet of Things
Mindtrek 2016
Andrea Pietracaprina - In this talk, we will overview some popular computing frameworks (e.g., MapReduce, Spark) which are widely used to unleash the computational potential of the cloud for big-data applications. For concreteness, we will describe efficient implementations of some key tools used in data analysis (e.g, clustering, diversity maximization).
The profile of the management (data) scientist: Potential scenarios and skill...Juan Mateos-Garcia
Big and Social Media data opens up new scenarios and opportunities for management research (such as using internal communication data to map knowledge networks inside firms, or using web data to study firm capabilities and strategies). This presentation, given at the British Academy of Management 2014 conference proposes a typology of such scenarios, describes the skills required to exploit them, and considers implications for the education and training of management researchers.
Data Mining is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. It is very important to understand the importance and need of data mining in todays situation.
An introduction to Data Mining by Kurt ThearlingPim Piepers
An Introduction to Data Mining Discovering hidden value in your data warehouse By Kurt Thearling Overview Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?" This white paper provides an introduction to the basic technologies of data mining. Examples of profitable applications illustrate its relevance to today’s business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.
Van der Valk, J. - Using mobile phone data to understand functional geographies.OECDregions
Using mobile phone data to understand functional geographies - Johan van der Valk, Statistics Netherlands.
Workshop on Modernising Statistical Systems, OECD 2018.
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataGloria Re Calegari
We present the challenges faced by a Data Scientist in exploring and analyzing heterogeneous Open Geospatial Data. This work is aimed at explaining the initial steps of a data exploration process, specifically aimed at discovering similarities and differences conveyed by diverse sources and resulting from their correlation analysis; we also explore the influence of spatial resolution on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
This lecture highlights current trends, challenges and opportunities related to the emergence of large amounts of data. It also presents Sirris’s recent research activities in this domain.
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
Where we are and are going for Big Data in OpenScience
Keynote talk at the Big Data Europe SC6 Workshop on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017: The perspective of European official statistics by Fernando Reis, Task-Force Big Data, European Commission (Eurostat).
In these slides, I, Susan Banducci and Iulia Cioroianu from the University of Exeter talk about online data sources and how you use those to answer questions about information exposure.
Sotiris is currently working as Research Director with the Institute of Computer Science at the Foundation for Research and Technology - Hellas, where his research interests include systems, networks, and security. He is also a member of the European Union Agency for Network and Information Security (ENISA) Permanent Stakeholders Group! During Data Science Conference, Sotiris will talk about how data sharing between private companies and research facilities may lead to monetization.
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Using administrative data to measure public procurement of R&D: Opportunities...STIEAS
OECD expert workshop on the measurement of public procurement of innovation. "Using administrative data to measure public procurement of R&D: Opportunities and Challenges", presentation by Lionel Kapff
Similar to New data sources for statistics: Experiences at Statistics Netherlands. (20)
Social media sentiment and consumer confidencePiet J.H. Daas
Presentation on the association between the sentiment in public Dutch social media messages and Dutch consumer confidence. Given at the ECB conference on Big data and forecasting and statistics in Frankfurt
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
New data sources for statistics: Experiences at Statistics Netherlands.
1. New data sources for
statistics
Experiences at Statistics Netherlands
Piet Daas, Marko Roos, Chris de Blois,
Rutger Hoekstra, Olav ten Bosch, and Yinyi Ma
NTTS 2011
2. Why new data sources?
• Many NSI’s traditionally use:
• Surveys
• Administrative data (registers)
• But there are other sources of information out
there (especially the electronic ones)
• Are they really useful?
• Investigate it! (studies are supported by DG of Stat. Neth.)
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
1
3. Examples of new data sources
Three will be discussed:
1. Product prices on the internet
2. Mobile phone location data
3. Twitter text messages
4. Global Positioning System (GPS) data
(and traffic loop information)
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
2
4. 1) Product prices on the internet
• Ubiquitous on the World Wide Web
• Stat. Neth. already uses web price data for the
Consumer Price Index:
• E.g. prices of airline tickets, books, CD’s, DVD’s
• But this data is manually collected
• Why not automatically (and more often)?
• With a web robot (web spider)
• Which is a script or (commercial) tool
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
3
5. Product prices on the internet (2)
Collected data:
• Daily (over a 10 month period)
• 6 websites: 4 airlines, 1 housing, and 1 (unstaffed) petrol station
site
Works well but:
• Some websites are very complicated
• Especially dynamic websites, if possible directly tap into database
• Sometimes websites change lay-out
• 3 of the 4 airline websites did this (in test period)
• Affects the cost efficiency (redesigning scripts takes a lot of time)
• Manual data collection is much easier and cheaper!
• But: Automatic data collection has its own merits
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
4
6. Product prices on the internet (3)
Example: Airline ticket prices (shown over 116 day period)
Manual
collected:
1 day before
departure date
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
5
7. 2) Mobile phone location data
• Almost everybody has a mobile phone
• in the Netherlands ~92%
• People use it a lot
• For every outgoing and incoming call the phone
connects to a nearby telephone mast (‘cell’)
• Source of location information!
• Every mobile phone and masts have an unique ID
• This data is logged by mobile phone companies
• Used for billing purposes & network maintenance
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
6
8. Mobile phone location data (2)
• Could be an interesting source of information
• Let's study a dataset
• Obtained data from 1 large Dutch mobile phone company
• over 5 million different phones active on their own network
• Dataset covered a 14 day period
• Contained 550 million records (call events)
• Every record contains an unique phone ID, date-time stamp, and
mast (cell) connection ID (= location info)
• Phone ID’s were scrambled to avoid identification
• Scrambled ID’s were stable over 14 day period
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
7
9. Mobile phone location data (3)
Typical day in the Netherlands (call activity)
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
8
10. Movie of call activity during the day
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
9
11. Mobile phone location data (4)
This type of location information could possibly
be used to study:
• Overall day time movement of people
• Perhaps also: movement of individual mobile phones during the
day
• Distinguish regions of different economic activity
• Different behaviour during the week and on ‘specific’ days
• For tourism
• Roaming info: Activity of non-‘Dutch’ phones in the Netherlands
• BUT!
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
10
12. Mobile phone location data (5)
However: methodological issues
1. Translation of call intensity per cell to call intensity
per region (cell coverage area)
2. Representativity
• Phone-ID’s vs. Dutch population
• Only used data of one mobile phone company
• Some people are more active callers then others
3. How does the number of calls relate to the number of
people present at the location?
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
11
13. 3) Twitter text messages
• Social media is used more and more
intensively in the Netherlands & Europe
• Potential source of personal information,
opinions, and sentiments
• But what type of information is actually
exchanged?
• Investigated Twitter (as an example)
• Easily accessible (text)data and used a lot in
the Netherlands
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
12
14. Twitter text messages (2)
• Twitter is a micro blogging service
• Text messages of 140 characters max
• Called ‘tweets’
• Posted to the public or to friends only
• Hashsign (#) is used to highlight ‘keywords’
• Example: #Eurostat, #NTTS
• A few examples
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
13
15. Twitter text messages (3)
• Identify topics discussed on Twitter in the
Netherlands by collecting ‘tweets’
• Use this information to decide if Twitter (and
perhaps social media in general) is of interest for
Stat. Neth.
• Collect tweets from ‘all’ Dutch Twitter users!
• People located in the Netherlands
• Use location info provided by users
• Try to get a complete overview
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
14
16. Twitter text messages (4)
• Collect tweets
• First studies by using Twitter search option
• Radius of 200km from Utrecht + Dutch language filter
• BUT: Twitter appears to apply a ‘quality filter’ so data was incomplete
• Best alternative: First ‘crawl’ for users, then collect tweets
• Search through users tree: select users with large number of
followers (‘friends’), select these and expand search
• User is Dutch if location includes ‘Netherlands’ or the name of a
Dutch municipality
• Collected 380,415 unique usernames
• For ever user collect up to 200 tweets; obtained ~12 million tweets
• Identify topics discussed (first approach: used hashtags, manually)
• ~1,8 million tweets contained a hashtag
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
15
17. Twitter text messages (5)
All #hashtags
News Events Products
Companies
Top 500 #hashtags Locations
Radio 1% 1% 1% 1% 1%
Emotions
Table 1. Classification of Twitter messages collected according to hashtags used
• Results of #hashtag classification 500# Top 500# no
Applications
Top TV
2%
3%
Politics All# no
Category Description
Politics 7% Examples only (%) 3%Other (%) Other (%)
Twitter Sports
Twitter/internet specific language & slang #durftevragen, #fail, #twexit 12 3%
19 Applications
19
7%
Sports Sports, clubs, and sports events
9% #WK2010, #ajax, #oranje 9 14 14
TV Twitter specific programs
Applications #nowplaying, #lastfm, #in 8 13
3% 12
Politics 6%Political debates, leaders, and parties #tk2010, #NOSdebat, #formatie 7 11 Sports 11
TV Dutch TV-programs (no political & no news) #dwdd, #ohohcherso, #tvoh 6 10
4% 11
Emotions Sentiment and feelings
Twitter
Emotions #moe, #LOL, #zucht, #heerlijk 6 10 10
12%
Locations 6%References to a location or municipality #amsterdam, #utrecht 3 5 Twitter 5
5%
Products
Locations 3% Referring to products #iPhone, #iPad, #android 3 4 4
Events Non-sport and non-political happenings #twibbon, #LL10, #lowlands 3 4 4
Products 3% Referring to news programs
News #nos, #pownews, #Nujij 2 4 4
Companies Referring to companies #ns, #google, #tmobile, #KPN 2 4 4
3%
Events
Radio Dutch radio programs #3fm, #53j8, #radio1 1 2 2
Other 2% Rest group, mostly unrelated tags #koffie, #goedemorgen Other 38 - -
News
72%
2% Other
Companies
1%
Radio 38%
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
16
18. Twitter text messages (6)
• First conclusions (based on topics identified by
using hashtags)
• Potential interesting for politics and events
• Overall study suggests ~5% in our total dataset
• Around 600,000 tweets
• Twitter could probably also be used:
• for info on social and cultural participation and on social cohesion
• Need to further refine our studies
• More in depth studies of all tweets collected (also without #)
• Use (more advanced) text mining techniques for classification
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
17
19. Overall remarks
• Many interesting new data sources out there
• But its not easy to determine their usefulness
• Automatically collect product prices from the web
• Only in addition to the traditional manual process
• To obtain more & more frequent data
• Mobile phone & Twitter data
• Representativity of the data is a key issue
• Not all (Dutch) people are observed
• Hardly any background information available
• Is a major topic in future research
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
18
20. Thank you for your attention!
• #Questions?
NTTS2011 New data sources for statistics: Exp. Stat. Neth.
19