Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Web Science Intro
Marco Brambilla, Emanuele Della Valle
@marcobrambi
@manudellavalleWeb Science
There are more things
In heaven and earth, Horatio,
Than are dreamt of in your philosophy.
Shakespeare (Hamlet Act 1, scen...
The data deluge:
social and pervasive data
1 GigaByte of data
1 ZettaByte of data
(*) Or 152 million years of high-definition video.
The new data revolution:
From business to society
What’s out in the world
The current IT revolution
(Google, Amazon, Uber, Airbnb)
focuses on large scale data
management & ...
Climate, energy,
transportation & smart cities,
disaster recovery,
personalized medicine & health.
Data can change the soc...
From Data to Wisdom
Data:
models, languages, user
interaction, analysis,
mining and optimization.
Formalizing new knowledge is hard
Only high frequency emerges
The long tail challenge
Knowledge Extraction
Text mining
Semantic Web
Multimedia data extraction
…
Heaven and Heart
How to peer through an effective window
on real world?
Social media, our blessing and curse
Domain expert...
Beware the streetlamp effect
The bias of the source
The bias of the observer
The challenges (4Vs)
Volume
Variety
Velocity
Veracity
Conclusion
Learning about big data, data analytics and
web data extraction is prominent for your
future success!
Agenda
Web Data API
Semantic Web
OWL
RDF+Web Data Publishing
SPARQL + R2RML
invited class - Harvard
Data Wrangling
R langu...
Agenda
Classification + Clustering
Recommender Systems + Diversification
Graph Analysis
Big Data Technology
Streaming Data...
Evaluation (aka., exam)
• Written exam with theory + exercises
• Project work (in 3-persons groups)
with discussion
• Opti...
Logistics
• Calendar always updated online
https://docs.google.com/spreadsheets/d/1YRXWK5ihby0hr8u6azkTKrWjylFgaCtoFGUsgEl...
Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it
Marco Brambilla, @manudellavalle, emanuele.dellavalle@polimi.it
h...
Upcoming SlideShare
Loading in …5
×

Web Science. An introduction

742 views

Published on

The Web Science course focuses on the study of large-scale socio-technical systems associated with the World Wide Web. It considers the relationship between people and technology, the ways that society and technology complement one another and the way they impact on broader society. These analyses are inherently associated with Big Data management issues.
The course is organised in four parts.

1. Syntax
In the first part, the course introduces the basis of content analysis. If focuses on the syntactic aspects, covering the fundamentals of natural language processing and text mining. It describes the structure and typical characteristics of the different web sources, spanning search results, social media contents, social network structures, Web APIs, and so on. It also provides an overview of the basic Web analysis techniques applied in Web search and Web recommendation.

2. Semantics
In the second part, the course presents semantic technologies. These technologies are very important nowadays because they allow to treat the "variety" dimension of Big Data, i.e., they enable integration of multiple and diverse sources of information, which is typical on the modern Web platform. Covered topics include:
- RDF - a flexible data model to represent heterogeneous data
- OWL - a flexible ontological language to model heterogeneous data sources
- SPARQL - a query language for RDF.
It shows how to put all the pieces together in order to achieve interoperability among heterogeneous information sources

3. Time
The third part covers the realm of temporal-dependent data. The topics covered here allow to treat the "velocity" dimension of Big Data. It shows the importance for many Big Data analysis scenarios to process data stream, coming for instance from Internet of Things (IoT) and Social Media sources; and it describes how to apply semantic and syntactic techniques in the context of time-dependent information. For instance, it shows how to extend RDF to model RDF streams, how to extend SPARQL to continuously process RDF streams and how to reason on those RDF Streams

4. Applications
In the fourth part, the course focuses on specific application scenarios and presents the typical settings and problems where the presented techniques can be applied. This part discusses settings such as: big data analysis for smart cities; data analytics for brand monitoring (marketing) and event monitoring; data analysis for trend detection and user engagement; and so on.

Published in: Science

Web Science. An introduction

  1. 1. Web Science Intro Marco Brambilla, Emanuele Della Valle @marcobrambi @manudellavalleWeb Science
  2. 2. There are more things In heaven and earth, Horatio, Than are dreamt of in your philosophy. Shakespeare (Hamlet Act 1, scene 5)
  3. 3. The data deluge: social and pervasive data
  4. 4. 1 GigaByte of data
  5. 5. 1 ZettaByte of data (*) Or 152 million years of high-definition video.
  6. 6. The new data revolution: From business to society
  7. 7. What’s out in the world The current IT revolution (Google, Amazon, Uber, Airbnb) focuses on large scale data management & analysis
  8. 8. Climate, energy, transportation & smart cities, disaster recovery, personalized medicine & health. Data can change the society! Big data challenges for mankind
  9. 9. From Data to Wisdom
  10. 10. Data: models, languages, user interaction, analysis, mining and optimization.
  11. 11. Formalizing new knowledge is hard Only high frequency emerges The long tail challenge
  12. 12. Knowledge Extraction Text mining Semantic Web Multimedia data extraction …
  13. 13. Heaven and Heart How to peer through an effective window on real world? Social media, our blessing and curse Domain experts matter
  14. 14. Beware the streetlamp effect The bias of the source The bias of the observer
  15. 15. The challenges (4Vs) Volume Variety Velocity Veracity
  16. 16. Conclusion Learning about big data, data analytics and web data extraction is prominent for your future success!
  17. 17. Agenda Web Data API Semantic Web OWL RDF+Web Data Publishing SPARQL + R2RML invited class - Harvard Data Wrangling R language
  18. 18. Agenda Classification + Clustering Recommender Systems + Diversification Graph Analysis Big Data Technology Streaming Data and API Spark Human Computation Crowdsourcing Case studies and conclusion
  19. 19. Evaluation (aka., exam) • Written exam with theory + exercises • Project work (in 3-persons groups) with discussion • Optional: reading and presentation
  20. 20. Logistics • Calendar always updated online https://docs.google.com/spreadsheets/d/1YRXWK5ihby0hr8u6azkTKrWjylFgaCtoFGUsgElYoms/edit?usp=sharing • Web site of the course with all materials
  21. 21. Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it Marco Brambilla, @manudellavalle, emanuele.dellavalle@polimi.it http://datascience.deib.polimi.it

×