How much data?
 800 Terabytes, 2001
 60 Exabytes, 2006
 500 Exabytes, 2009
 2.7 Zettabytes, 2012
 35 Zettabytes by 20...
A visualization
created by IBM of
Wikipedia edits. At
multiple terabytes in
size, the text and
images of Wikipedia
are a c...
 Relational Data
(Tables/Transaction/Legacy Data)
 Text Data (Web)
 Semi-structured Data (XML)
 Graph Data
› Social Ne...
 Aggregation and Statistics
› Data warehouse and OLAP
 Indexing, Searching, and Querying
› Keyword based search
› Patter...
 There is no consensus as how to define
Big Data.
“Big data exceeds the reach of commonly used hardware
environments and ...
Variety
Structured and
unstructured
data: clinical
notes, audio
transcription, ima
ging, click streams
Velocity
Often time...
1) Automatically generated by a machine
(e.g. Sensor embedded in an engine)
2) Typically an entirely new source of data
(e...
•Most new data sources were considered big and difficult
•Just the next wave of new, bigger data
The Past The Present The ...
(1) The “big” part
(2) The “data” part
(3) Both
(4) Neither
The answer is choice (4)
What is important is what the organiz...
 Decoding the human genome originally took 10
years to process, now it can be achieved in less than
a week!!!
 Tobias Pr...
1. Big data can unlock significant value by making
information transparent and usable at much higher
frequency.
2. As orga...



1. The use of big data will become a key basis of
competition and growth for individual firms. All
companies need to take ...
 Will be so overwhelmed
› Need the right people and solve the right
problems
 Costs escalate too fast
› Isn’t necessary ...
 Very strong assumptions are made about
mathematical properties that may not at all reflect
what is really going on at th...
 The biggest value in big data can be driven by
combing big data with other corporate data
Big
data
Other
data
Create a
s...
 Banking industries were very hard to
handle even a decade ago
 “BIG” will change
› Big data will continue to evolve
 en.wikipedia.org/wiki/Big_data
 www.google.com
 www.mckinsey.com
Big Data
Big Data
Big Data
Big Data
Big Data
Big Data
Upcoming SlideShare
Loading in...5
×

Big Data

490

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
490
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
49
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Big Data

  1. 1. How much data?  800 Terabytes, 2001  60 Exabytes, 2006  500 Exabytes, 2009  2.7 Zettabytes, 2012  35 Zettabytes by 2020 How much data is generated in one day?  7 TB, Twitter  15 TB, Facebook(2009)  20 PB, Google(2008)  6.5 PB Ebay(2009)
  2. 2. A visualization created by IBM of Wikipedia edits. At multiple terabytes in size, the text and images of Wikipedia are a classic example of big data.
  3. 3.  Relational Data (Tables/Transaction/Legacy Data)  Text Data (Web)  Semi-structured Data (XML)  Graph Data › Social Network, Semantic Web (RDF)  Streaming Data › You can only scan the data once
  4. 4.  Aggregation and Statistics › Data warehouse and OLAP  Indexing, Searching, and Querying › Keyword based search › Pattern matching (XML/RDF)  Knowledge discovery › Data Mining › Statistical Modeling
  5. 5.  There is no consensus as how to define Big Data. “Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it with in a tolerable elapsed time for its user population.” - Teradata Magazine article, 2011 “Big data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.” - The McKinsey Global Institute, 2011
  6. 6. Variety Structured and unstructured data: clinical notes, audio transcription, ima ging, click streams Velocity Often time- sensitive, data must be analyzed as it’s streaming in to maximize its value(e.g. patient monitoring) Volume Electronic medical records, images, digital pathology, email, web communications
  7. 7. 1) Automatically generated by a machine (e.g. Sensor embedded in an engine) 2) Typically an entirely new source of data (e.g. Use of the internet) 3) Not designed to be friendly (e.g. Text streams) 4) May not have much values › Need to focus on the important part
  8. 8. •Most new data sources were considered big and difficult •Just the next wave of new, bigger data The Past The Present The Future
  9. 9. (1) The “big” part (2) The “data” part (3) Both (4) Neither The answer is choice (4) What is important is what the organizations do with Big Data.
  10. 10.  Decoding the human genome originally took 10 years to process, now it can be achieved in less than a week!!!  Tobias Preis used Google Trends data to demonstrate that Internet users from countries with a higher per capita GDP are more likely to search for information about the future than information about the past. The findings suggest there may be a link between online behaviour and real-world economic indicators!!!  Tobias Preis and H. Eugene Stanley’s analysis of Google search volume for 98 terms of varying financial relevance, published in Scientific Reports, suggests that increases in search volume for financially relevant search terms tend to precede large losses in financial markets!!!
  11. 11. 1. Big data can unlock significant value by making information transparent and usable at much higher frequency. 2. As organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on and therefore expose variability and boost performance. 3. Big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services. 4. Sophisticated analytics can substantially improve decision-making. 5. Big data can be used to improve the development of the next generation of products and services.
  12. 12.   
  13. 13. 1. The use of big data will become a key basis of competition and growth for individual firms. All companies need to take big data seriously. 2. The use of big data will underpin new waves of productivity growth(growth by 60% possible) and consumer surplus. 3. The computer and electronic products and information sectors, as well as finance and insurance, and government are poised to gain substantially from the use of big data. 4. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.
  14. 14.  Will be so overwhelmed › Need the right people and solve the right problems  Costs escalate too fast › Isn’t necessary to capture 100%  Sources of big data may be private › Self-regulation › Legal regulation
  15. 15.  Very strong assumptions are made about mathematical properties that may not at all reflect what is really going on at the level of micro- processes.  Even as companies invest eight- and nine-figure sums to derive insight from information streaming in from suppliers and customers, less than 40% of employees have sufficiently mature processes and skills to do so.  The decisions based on the analysis of Big Data are inevitably "informed by the world as it was in the past, or, at best, as it currently is“.  If the systems dynamics of the future change, the past can say little about the future. For this, it would be necessary to have a thorough understanding of the systems dynamic, which implies theory.
  16. 16.  The biggest value in big data can be driven by combing big data with other corporate data Big data Other data Create a synergy effect
  17. 17.  Banking industries were very hard to handle even a decade ago  “BIG” will change › Big data will continue to evolve
  18. 18.  en.wikipedia.org/wiki/Big_data  www.google.com  www.mckinsey.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×