0
Concept of Big Data
Presented by
MTech-CE(Boys Group)
What is Data
The word Data is plural of datum in the Latin dare
which meant "to give", that is to “something given”.
Dat...
Type of Data
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
So...
Big Data
Definition
Big data is a massive volume of both structured and
unstructured data that is so large that it's diff...
Walmart handles more than 1 million customer
transactions every hour.
Facebook handles 40 billion photos from its
user b...
DataUnits
Big Data is Data growing
faster than Moore’s law
1 Bytes - 8 Bits
1 Kilobyte(KB) - 10^3 Bytes
1 Megabyte(MB) ...
Big Big Big
Data
Petabyte(PB) - 10^15 Bytes
Exabyte (EB) - 10^18 Bytes
Zettabyte(ZB) - 10^21 Bytes
Yottabyte (YB) - 10...
Characteristics
of Big Data
Volume
DataVolume
44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
Varity
Various formats, types, and structures
Text, numerical, images, audio, video,
sequences, time series, social medi...
Velocity
Data is begin generated fast and need to be
processed fast
Online Data Analytics
Late decisions  missing oppo...
Big Data
(3-V)
Some Make it
4V’s
Harnessing
Big Data
OLTP: OnlineTransaction Processing
(DBMSs)
OLAP: Online Analytical Processing
(DataWarehousing)
RTA...
LayOut
Who’s
Generating Big
Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all...
Implementation
of Big Data
Parallel DBMS technologies
Proposed in late eighties
Matured over the last two decades
Mult...
MetaData
Management
of Big Data
MapReduce Parallel DBMS technologies
 Data-parallel programming
model
 An associated parallel and
distributed
 implemen...
MapReduce
Advantages
Automatic Parallelization:
Depending on the size of RAW INPUT DATA 
instantiate multiple MAP tasks...
Big dataset
(Hadoop)
Why Hadoop
Big Data analytics and the apache hadoop
open source project are rapidly emerging as
the preferred solution to...
Hadoop
Adoption in
Industry
What is
Hadoop???
Challenge in
Big Data
 Big Data Integration is Multidisciplinary
Less than 10% of Big Data world are genuinely
relationa...
Provocations
for Big Data
1. Automating Research Changes the Definition of
Knowledge
2. Claim to Objectively and Accuracy ...
Who is
collecting all
Big Data
Web Browsers Search Engines
Who is
collecting all
Big Data
Smartphones & Apps
Apple’s iPhone
(Apple O/S)
Samsung, HTC.
Nokia, Motorola
(Android O/S)
R...
Who is
collecting for
what?
Credit Card Companies What data are they getting?
Restaurant check
Grocery Bill
Airline ticket...
Why are they
collecting all
this data?
Target Marketing
 To send you catalogs for exactly
the merchandise you typically
p...
Future
Enhancement
Smartphones and tablets outsold desktop and
laptop computers in 2011. There are more
Smartphones in th...
Conclusion
Big Data and Big Data Analytics – Not Just for Large
Organizations
It Is Not Just About Building Bigger Datab...
Closing
Thought
Big data is not just about helping an organization be
more successful – to market more effectively or imp...
Thank you
Upcoming SlideShare
Loading in...5
×

A Big Data Concept

1,122

Published on

Published in: Engineering, Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,122
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
88
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "A Big Data Concept"

  1. 1. Concept of Big Data Presented by MTech-CE(Boys Group)
  2. 2. What is Data The word Data is plural of datum in the Latin dare which meant "to give", that is to “something given”. Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derived. Information in raw or unorganized form(such as alphabets, numbers, or symbols) that refer to, or represent, conditions, ideas, or objects. Data is limitless and present everywhere in the universe. See also information and knowledge. Computers: Symbols or signals that are input, stored, and processed by a computer, for output as usable information.
  3. 3. Type of Data Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, SemanticWeb (RDF), … Streaming Data You can only scan the data once
  4. 4. Big Data Definition Big data is a massive volume of both structured and unstructured data that is so large that it's difficult to process with traditional database and software techniques. Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Big data is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
  5. 5. Walmart handles more than 1 million customer transactions every hour. Facebook handles 40 billion photos from its user base. Decoding the human genome originally took 10 years to process; now it can be achieved in one week. Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100TB/month (3/2009) Facebook has 2.5 PB of user data + 15TB/day (4/2009) eBay has 6.5 PB of user data + 50TB/day (5/2009) Where the Big Data???
  6. 6. DataUnits Big Data is Data growing faster than Moore’s law 1 Bytes - 8 Bits 1 Kilobyte(KB) - 10^3 Bytes 1 Megabyte(MB) - 10^6 Bytes 1 Gigabyte(GB) - 10^9 Bytes 1 Terabyte(TB) - 10^12 Bytes)
  7. 7. Big Big Big Data Petabyte(PB) - 10^15 Bytes Exabyte (EB) - 10^18 Bytes Zettabyte(ZB) - 10^21 Bytes Yottabyte (YB) - 10^24 Bytes Xenottabyte(XB) - 10^27 Bytes Shilentnobyte (SB) - 10^30 Bytes Domegrottebyte (DB) - 10^33 Bytes
  8. 8. Characteristics of Big Data
  9. 9. Volume DataVolume 44x increase from 2009 2020 From 0.8 zettabytes to 35zb Data volume is increasing exponentially
  10. 10. Varity Various formats, types, and structures Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… Static data vs. streaming data A single application can be generating/collecting many types of data
  11. 11. Velocity Data is begin generated fast and need to be processed fast Online Data Analytics Late decisions  missing opportunities Examples E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction
  12. 12. Big Data (3-V)
  13. 13. Some Make it 4V’s
  14. 14. Harnessing Big Data OLTP: OnlineTransaction Processing (DBMSs) OLAP: Online Analytical Processing (DataWarehousing) RTAP: Real-TimeAnalytics Processing (Big DataArchitecture & technology)
  15. 15. LayOut
  16. 16. Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
  17. 17. Implementation of Big Data Parallel DBMS technologies Proposed in late eighties Matured over the last two decades Multi-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions for very large enterprises Map Reduce pioneered by Google popularized byYahoo! (Hadoop)
  18. 18. MetaData Management of Big Data
  19. 19. MapReduce Parallel DBMS technologies  Data-parallel programming model  An associated parallel and distributed  implementation for commodity clusters  Popularized by open- source Hadoop  Used byYahoo!, Facebook,  Amazon, and the list is growing …  Popularly used for more than two decades  Research Projects: Gamma, Grace, …  Commercial: Multi-billion dollar industry but access to only a privileged few  Relational Data Model  Indexing  Familiar SQL interface  Advanced query optimization  Well understood and studied Comparison
  20. 20. MapReduce Advantages Automatic Parallelization: Depending on the size of RAW INPUT DATA  instantiate multiple MAP tasks Similarly, depending upon the number of intermediate <key, value> partitions  instantiate multiple REDUCE tasks Run-time: Data partitioning Task scheduling Handling machine failures Managing inter-machine communication Completely transparent to the programmer / analyst / end user
  21. 21. Big dataset (Hadoop)
  22. 22. Why Hadoop Big Data analytics and the apache hadoop open source project are rapidly emerging as the preferred solution to address business & technology trends that’s are disrupting traditional data management & processing
  23. 23. Hadoop Adoption in Industry
  24. 24. What is Hadoop???
  25. 25. Challenge in Big Data  Big Data Integration is Multidisciplinary Less than 10% of Big Data world are genuinely relational Meaningful data integration in the real, messy, schema- less and complex Big Data world of database and semantic web using multidisciplinary and multi- technology method The Linked Open Data Ripper Mapping, Ranking,Visualization, Key Matching, Snappiness Demonstrate theValue of Semantics: let data integration drive DBMS technology Large volumes of heterogeneous data, like link data and RDF
  26. 26. Provocations for Big Data 1. Automating Research Changes the Definition of Knowledge 2. Claim to Objectively and Accuracy are Misleading 3. Bigger Data are not always Better data 4. Not all Data are equivalent 5. Just because it is accessible doesn’t make it ethical 6. Limited access to big data creates new digital divides
  27. 27. Who is collecting all Big Data Web Browsers Search Engines
  28. 28. Who is collecting all Big Data Smartphones & Apps Apple’s iPhone (Apple O/S) Samsung, HTC. Nokia, Motorola (Android O/S) RIM Corp’s Blackberry (BlackBerry O/S) Tablet Computers & Apps Apple’s iPad Samsung’s Galaxy Amazon’s Kindle Fire
  29. 29. Who is collecting for what? Credit Card Companies What data are they getting? Restaurant check Grocery Bill Airline ticket Hotel Bill
  30. 30. Why are they collecting all this data? Target Marketing  To send you catalogs for exactly the merchandise you typically purchase.  To suggest medications that precisely match your medical history.  To “push” television channels to your set instead of your “pulling” them in.  To send advertisements on those channels just for us! Targeted Information  To know what you need before you even know you need it based on past purchasing habits!  To notify you of your expiring driver’s license or credit cards or last refill on a Rx, etc.  To give you turn-by-turn directions to a shelter in case of emergency.
  31. 31. Future Enhancement Smartphones and tablets outsold desktop and laptop computers in 2011. There are more Smartphones in the U.S. in 2012 than people! The phone in your pocket has more programmable memory, more storage and more capability than several large IBM computers. It takes dozens of microprocessors running 100 million lines of code to get a premium car out of the driveway, and this software is only going to get more complex. In fact, the cost of software and electronics accounts for 30-40% of the price.
  32. 32. Conclusion Big Data and Big Data Analytics – Not Just for Large Organizations It Is Not Just About Building Bigger Databases Moving Processing to the Data SourceYields Big Dividends Choose the Most Appropriate Big Data Scenario  Complete data scenario whereby entire data sets can be properly managed and factored into analytical processing, complete with in-database or in-memory processing and grid technologies.  Targeted data scenarios that use analytics and data management tools to determine the right data to feed into analytic models, for situations where using data set isn’t technically feasible or adds little value.
  33. 33. Closing Thought Big data is not just about helping an organization be more successful – to market more effectively or improve business operations. High-performance analytics from designed to support big data initiatives, with in-memory, in-database and grid computing options. Those organizations can benefit from cloud computing, where big data analytics is delivered as a service and IT resources can be quickly adjusted to meet changing business demands. On Demand provides customers with the option to push big data analytics to greatly eliminating the time, capital expense and maintenance associated with on-premises deployments.
  34. 34. Thank you
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×