“Big Data” and the
Challenges for
Statisticians
Setia Pramana
Math Department, FMIPA Brawijaya University
Malang 7 Februar...
Data Explosion
• Interactions of billions of people using computers, GPS
devices, cell phones, and medical devices.
• onli...
Data Explosion
• Interactions of billions of people using
computers, GPS devices, cell phones, and
medical devices.
• onli...
Data Explosion

http://www.csc.com/
Big Data
Big Data
• Volume
• Velocity
• Variety

http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data
Big Data
• Volume
• Velocity
• Variety

http://www.documentcapture.co.uk

A challenge: managing its size 
Storing, search...
Big Data
• Veracity: biases, noise and abnormality in data.
• Validity: is the data correct and accurate for the intended ...
Data Analysis Evolution

http://www.csc.com/
Connecting the Data

http://www.csc.com/
Connecting the Data
Connecting the Data
• After Haiti’s earthquake (2010),
researchers at the Karolinska Institute and
Columbia Univ showed th...
Big Data in Biomedicine
From where the big data comes from?
• Billions of measurements in the health
system: physician dia...
Big Data in Biomedicine
From where the big data comes from?
Microarray
• Measure expression of thousands of genes under different
conditions.
• Thousands of variables -> need special...
More: http://www.slideshare.net/hafidztio/geneexpression-introduction.
Publicly Available Data
Publicly Available Data

~300 Diseases!
What We Can Do?
Relate Several Data Repository

Disease Gene
Expression DB

Drug Gene
Expression DB

• New Drug Discovery
• Drug Repositio...
More..
• Genome Project:
• Next generation Sequencing, e.g, Whole
Genome Seq: info our 3 billion bp DNA code
• And many mo...
Data Science
• A multidisiplinary
science: Statistics, Math,
Comp Science, Machine
learning, Data
Munging/Cleaning, and
Da...
Data Scientist vs. Statistician

http://blog.revolutionanalytics.com
Is Statistician an
Endangered Species?
Statisticians Should...
• Have strong foundation in statistical
theory, methods, and software.
• Be expert in R and Python...
R

• R users is growing
R
• The continued rapid growth in add-on packages.
• The near monopoly R has on the latest analytic
methods.
• Its free pr...
Visualization: Twitter traffic
Learn From the Expert
Ready? Compete!
Reference
Thank you…….
“Big Data” and the  Challenges for Statisticians
“Big Data” and the  Challenges for Statisticians
Upcoming SlideShare
Loading in...5
×

“Big Data” and the Challenges for Statisticians

1,715

Published on

“Big Data” and the Challenges for Statisticians

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,715
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Before it was diffiicult to get data, especially for Skripsi….
  • Researchers from the two organisations obtained data on the outflow of people from Port-au-Prince following the earthquake by tracking the movement of nearly two million SIM cards in the country. They were able to accurately analyse the destination of over 600,000 people displaced from Port-au-Prince, and they made this information available to government and humanitarian organisations dealing with the crisis
  • “Big Data” and the Challenges for Statisticians

    1. 1. “Big Data” and the Challenges for Statisticians Setia Pramana Math Department, FMIPA Brawijaya University Malang 7 February 2014
    2. 2. Data Explosion • Interactions of billions of people using computers, GPS devices, cell phones, and medical devices. • online or mobile financial transactions, social media traffic, and GPS coordinates. • “In the next five years, we’ll generate more data as humankind than we generated in the previous 5,000 years”. Eron Kelly, GM Microsoft
    3. 3. Data Explosion • Interactions of billions of people using computers, GPS devices, cell phones, and medical devices. • online or mobile financial transactions, social media traffic, and GPS coordinates. • “In the next five years, we’ll generate more data as humankind than we generated in the previous 5,000 years”. Eron Kelly, GM Microsoft
    4. 4. Data Explosion http://www.csc.com/
    5. 5. Big Data
    6. 6. Big Data • Volume • Velocity • Variety http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data
    7. 7. Big Data • Volume • Velocity • Variety http://www.documentcapture.co.uk A challenge: managing its size  Storing, searching, analyzing, comparing, refining, combining, and visualizing.
    8. 8. Big Data • Veracity: biases, noise and abnormality in data. • Validity: is the data correct and accurate for the intended use? • Volatility: how long is data valid and how long should it be stored?
    9. 9. Data Analysis Evolution http://www.csc.com/
    10. 10. Connecting the Data http://www.csc.com/
    11. 11. Connecting the Data
    12. 12. Connecting the Data • After Haiti’s earthquake (2010), researchers at the Karolinska Institute and Columbia Univ showed that mobile data patterns could be used to understand the movement of refugees and the consequent health risks posed by these movements.
    13. 13. Big Data in Biomedicine From where the big data comes from? • Billions of measurements in the health system: physician diagnose, drug dispense, blood test, x-ray or CT scan, etc.. • Advanced Molecular tech: Microarray, Next generation Sequencing
    14. 14. Big Data in Biomedicine From where the big data comes from?
    15. 15. Microarray • Measure expression of thousands of genes under different conditions. • Thousands of variables -> need special statistics methods
    16. 16. More: http://www.slideshare.net/hafidztio/geneexpression-introduction.
    17. 17. Publicly Available Data
    18. 18. Publicly Available Data ~300 Diseases!
    19. 19. What We Can Do?
    20. 20. Relate Several Data Repository Disease Gene Expression DB Drug Gene Expression DB • New Drug Discovery • Drug Repositioning e.g., Viagra (unexpected)
    21. 21. More.. • Genome Project: • Next generation Sequencing, e.g, Whole Genome Seq: info our 3 billion bp DNA code • And many more……
    22. 22. Data Science • A multidisiplinary science: Statistics, Math, Comp Science, Machine learning, Data Munging/Cleaning, and Data Visualization, Domain expertise. http://drewconway.com Data Scientist: The Sexiest Job of the 21st Century
    23. 23. Data Scientist vs. Statistician http://blog.revolutionanalytics.com
    24. 24. Is Statistician an Endangered Species?
    25. 25. Statisticians Should... • Have strong foundation in statistical theory, methods, and software. • Be expert in R and Python. • Familiarity with data visualization and machine learning techniques. • Know about parallel computing, combining data from disparate sources, and handling textual and streaming data. • Get engaged to the real world. • More innovative ..
    26. 26. R • R users is growing
    27. 27. R • The continued rapid growth in add-on packages. • The near monopoly R has on the latest analytic methods. • Its free price. • The freedom to teach with real-world examples from outside organization.
    28. 28. Visualization: Twitter traffic
    29. 29. Learn From the Expert
    30. 30. Ready? Compete!
    31. 31. Reference
    32. 32. Thank you…….

    ×