Big data lab explores challenges and examples of big data

•

1 like•1,119 views

Alessandro Filazzola

BIOL2050 - Big data in ecology. Primer for test. Ecology at York University

Education

Challenges of Big Data
• Overwhelming
• Difficult to sort through to find something
meaningful
• Hard to manage

Examples of Big data
• http://www.coopercenter.org/demographics/
Racial-Dot-Map
• http://internet-map.net/

Examples of Big data
www.google.com/trends/
- FIFA world cup
- Beyonce
- Potatoes
- VHS

Big Data: What is the Big deal?
Google grew from processing 100 TB of data a
day in 2004 to 20 PB a day in 2008
We are producing more data than we are able to
store or analyze
Economist, 2010

Big Data: What is the Big deal?
Far out software

Big Data: What is the Big deal?
“Focusing on one individual at a time, we can provide better
reminders, search results, and advertisements by considering
all the locations the person is likely to be close to in the future
(e.g., “Need a haircut? In 4 days, you will be within 100 meters
of a salon that will have a $5 special at that time.”)”

Big Data: What is the Big deal?
Enable scientific breakthroughs
- Large Hadron Collider
- Sloan Sky Survey
- Genomics
- Climate data

Big data for ecology
• Ecologists produce large amount of data, but
needs to be compiled
• Ecologists must treat data as products, just
like publications
• Archive & share -> data repositories

Big Data for climate
Many different climate projects
- WorldClim
- CalClimate Commons
- NOAA
- European Climate Data
- Climate Data WMO

Climate data and rasters
Point < Line < Raster

Climate data and rasters
Weather station 1 Weather station 2

Climate data and rasters
Weather station 1 Weather station 2
Interpolated values

Big data & species distributions
Desert native
Chaenactis fremontii
Invasive thistle
Centaurea solstitalis

Example
Consortium of California herbaria – plant database
http://ucjeps.berkeley.edu/consortium/
CalAdapt – Climate commons
http://cal-adapt.org/data/tabular/

- Copy from internet
- Paste special, “as text”
- Delete everything except GPS and ID
- Re-label specimen to “id”
- Re-label “lat” and “lng”

- Copy and paste
- Click away from data area
- Check settings to match below

Model climate change
• Pick one GPS point, remove all the others
• Set time interval for daily, CCSM3
• Download data
• Plot temperatures from 1950 – 2099
• Will your species go extinct?
• Try other points

Similar to Big data lab explores challenges and examples of big data

Research issues in the big data and its ChallengesKathirvel Ayyaswamy

Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita

Introduction to Data Processing (by Srinath Perera)SLASSCOM Technology Forum

From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman

Big data and InternetSanoj Kumar

Big Data Story - From An Engineer's PerspectiveHien Luu

Big data 2017 finalAmjid Ali

Building your big data solution WSO2

Big Data Analytics and Open Data Sharjeel Imtiaz

Natusfera Lifewatch Competence Center EGI amsterdam 2016 smallFrancisco Pando

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite

Big DataTUSHAR GARG

Big data use cases in the cloud presentationTUSHAR GARG

Big Data Putchong Uthayopas

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY

Big&open data challenges for smartcity-PIC2014 ShanghaiVictoria López

XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington

Big Data WorldHossein Zahed

Big Data in Clinical ResearchMike Hogarth, MD, FACMI, FACP

Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon

Similar to Big data lab explores challenges and examples of big data (20)

Research issues in the big data and its Challenges

Big Data and Data Science: The Technologies Shaping Our Lives

Introduction to Data Processing (by Srinath Perera)

From DARPA to Shakespeare: All the Data we Can Handle

Big data and Internet

Big Data Story - From An Engineer's Perspective

Big data 2017 final

Building your big data solution

Big Data Analytics and Open Data

Natusfera Lifewatch Competence Center EGI amsterdam 2016 small

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...

Big Data

Big data use cases in the cloud presentation

Big Data

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

Big&open data challenges for smartcity-PIC2014 Shanghai

XLDB South America Keynote: eScience Institute and Myria

Big Data World

Big Data in Clinical Research

Informatics Transform : Re-engineering Libraries for the Data Decade

Recently uploaded

Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringSri Sairam College Of Engineering Bengaluru

How to Manage Buy 3 Get 1 Free in Odoo 17Celine George

Tree View Decoration Attribute in the Odoo 17Celine George

Mattingly "AI & Prompt Design: Large Language Models"National Information Standards Organization (NISO)

Transaction Management in Database Management SystemChristalin Nelson

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar

Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW

Chi-Square Test Non Parametric Test Categorical VariableNigar Kadar Mujawar,Womens College of Pharmacy,Peth Vadgaon,Kolhapur,416112

31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection

BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar

ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1

CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727

ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri

Spearman's correlation,Formula,Advantages,Nigar Kadar Mujawar,Womens College of Pharmacy,Peth Vadgaon,Kolhapur,416112

Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10

Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW

Oppenheimer Film Discussion for Philosophy and FilmStan Meyer

Paradigm shift in nursing research by RS MEHTABP KOIRALA INSTITUTE OF HELATH SCIENCS,, NEPAL

Recently uploaded (20)

Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering

How to Manage Buy 3 Get 1 Free in Odoo 17

Tree View Decoration Attribute in the Odoo 17

Mattingly "AI & Prompt Design: Large Language Models"

Transaction Management in Database Management System

BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx

Mythology Quiz-4th April 2024, Quiz Club NITW

Chi-Square Test Non Parametric Test Categorical Variable

31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...

BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx

ARTERIAL BLOOD GAS ANALYSIS........pptx

ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv

CLASSIFICATION OF ANTI - CANCER DRUGS.pptx

ICS 2208 Lecture Slide Notes for Topic 6

Spearman's correlation,Formula,Advantages,

Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...

Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW

Oppenheimer Film Discussion for Philosophy and Film

Paradigm shift in nursing research by RS MEHTA

Big data lab explores challenges and examples of big data

1. Big data lab BIOL2050

3. Challenges of Big Data • Overwhelming • Difficult to sort through to find something meaningful • Hard to manage

4. Examples of Big data • http://www.coopercenter.org/demographics/ Racial-Dot-Map • http://internet-map.net/

5. Examples of Big data www.google.com/trends/ - FIFA world cup - Beyonce - Potatoes - VHS

6. Big Data: What is the Big deal? Google grew from processing 100 TB of data a day in 2004 to 20 PB a day in 2008 We are producing more data than we are able to store or analyze Economist, 2010

7. Big Data: What is the Big deal? Far out software

8. Big Data: What is the Big deal? “Focusing on one individual at a time, we can provide better reminders, search results, and advertisements by considering all the locations the person is likely to be close to in the future (e.g., “Need a haircut? In 4 days, you will be within 100 meters of a salon that will have a $5 special at that time.”)”

9. Big Data: What is the Big deal? Enable scientific breakthroughs - Large Hadron Collider - Sloan Sky Survey - Genomics - Climate data

10.

11. Hampton et al, 2013

12. Big data for ecology • Ecologists produce large amount of data, but needs to be compiled • Ecologists must treat data as products, just like publications • Archive & share -> data repositories

13. Big data modeling exercise

14. Big Data for climate Many different climate projects - WorldClim - CalClimate Commons - NOAA - European Climate Data - Climate Data WMO

15. Climate data and rasters Point < Line < Raster

16. Climate data and rasters Weather station 1 Weather station 2

17. Climate data and rasters Weather station 1 Weather station 2 Interpolated values

18.

19. Climate data and rasters

20. Climate data and rasters

21. Big data & species distributions Desert native Chaenactis fremontii Invasive thistle Centaurea solstitalis

22. Climate & species distributions

23. Example Consortium of California herbaria – plant database http://ucjeps.berkeley.edu/consortium/ CalAdapt – Climate commons http://cal-adapt.org/data/tabular/

24.

25. Plantago insularis

26.

27.

28. - Copy from internet - Paste special, “as text” - Delete everything except GPS and ID - Re-label specimen to “id” - Re-label “lat” and “lng”

29.

30.

31.

32. - Copy and paste - Click away from data area - Check settings to match below

33. - Copy and paste - Click away from data area - Check settings to match below

34.

35.

36. Model climate change • Pick one GPS point, remove all the others • Set time interval for daily, CCSM3 • Download data • Plot temperatures from 1950 – 2099 • Will your species go extinct? • Try other points

Editor's Notes

In 2010, Google estimated that their search index holds 100 million gigabytes of data. Every minute, 48 hours of video is uploaded to YouTube, we send over 100,000 Tweets, Flickr users add 3,125 new photographs, and more than 570 new websites are created
Big data is great, but there are some associated challenges. It is overwhelming in terms of how much there is. Consequently, it is difficult to sort through. Thinking of the previous infographic, having 48 hours of youtube video isn’t necessarily informative. How can we better sort this data into something that is manageable. This leads to the last challenge in that it is difficult to manage. Even if you have a question and know the data to answer it, how would you go about managing it.
Well there is dedicated science dedicated to organizing and processing extremely large amounts of data and conveying it in simpler way. Here are two easy to understand visualizations that use exceptionally large amounts of data.
An industry leader in processes data is google. Google analyzes exceptional amounts of data every second and one visualization of it is Google trends. This website outputs the popularity of a search term over time and provides other statistics including events that contributed to the popularity or associated country. Compare how trends increase and decrease over time. Things that may push the trends in a certain direction. Relate to how this data would need to be collected and perpetually updated. Let the students explore this on their own.
This trend of analyzing data is increasing. In 2004 google was analyzing 100 terabytes of data. This increased 10,000 fold to 20 petabytes in 2008. Imagine today the amount of data being processed.*** 1024 terabytes in a petabyte (PB).
“Far out” software claims to be able to predict your location years into the future - even if you don't know where you'll be. 'Far Out' is the result of statistical research that looks at GPS data, learns your typical movements and then extrapolates to decide on your likely future location. The result, according to the team behind it, is a system that can make "highly accurate" predictions about where you'll be years down the line.
Knowing where you are was 2008. Knowing where you were going to be was last year. Now companies not only want to know where you are going to be, but how to tailor what you are going to come across.
Other than for advertising or industry, big data can help with scientific breakthroughs. The particle accelerator in Cern, the Sloan Sky survey or genomics.
For ecology, some big data sets include long-term experimental research, crowd-sourced data sets from the public such as the breeding bird survey. There is also climate measurements from weather stations and remote sensing from aerial photography.
Big data in ecology isn’t always single long term datasets. There is already loads of existing data out there than can be compiled to answer new questions. Similar experiments occurring in tandem globally can answer world challenges. Ecologists produce large volumes of data, but do not compile
There are many different climate projects based on different areas.
A raster is a plane of data. If you have a data point, it is a single spot in space. A line is two points with interpolated values in between. This means that along the entirety of that line, there are values. A raster is one step further in that it is a plane of data like a piece of paper.
Imagine two weather stations in which one is hot and the other is cold. They both record temperatures continuously over time.
A raster generates interpolated values along the entire area in between the two weather stations from hot to cold.
Now, extending this to many more weather stations on a global scale.
It generates this network of values based on the weather stations constantly recording.
That becomes rasterized based on interpolated values. With this raster, there is a temperature value for every point within this area.
Big data can also be used to map species distributions. They can be publically generated. For instance, here is Cal Flora where anyone can record the occurrence of a plant species in a location of California. This data is constantly uploaded and generates maps of where the species can be found. Compare the differences between a desert native plant species found mostly in the Mojave region, while an invasive thistle dominates in the non-desert areas.
These species distributions can then be mapped onto climate data for that area. With this information we can make inferences about the species and where it may be predicted.
California is advanced in terms of managing of compiling data including Climate and species distributions. We are going to use the Consortium of California Herbaria that is a publically filled data based on plant occurrences for the last 50 years. This data is publically available and contains a fair amount of information other than just the occurrence. CalAdapt is a climate database that uses weather stations from previous climate to predict future climate scenarios. Our exercise is to model the distributions of a plant species with future climate projections.
The thermal niche of Plantago insularis, where above 25 degrees and below 11 degrees the likelihood of occurrence decreases signficiantly.

Big data lab explores challenges and examples of big data

Recommended

Recommended

More Related Content

Similar to Big data lab explores challenges and examples of big data

Similar to Big data lab explores challenges and examples of big data (20)

More from Alessandro Filazzola

More from Alessandro Filazzola (9)

Recently uploaded

Recently uploaded (20)

Big data lab explores challenges and examples of big data

Editor's Notes