Agenda
• What is Big data?
• Some BIG facts
• Objective
• Sources
• 3 V’s of Big data
• 3 + 1 V’s of Big data
• Technologi...
What is Big data?

Data

Big Data
What is Big data?

Data

Big Data
Some BIG facts
• 90% of the data in the world today has been created in the
last two years alone
• IDC Forecasting: The gl...
Some BIG facts – What happens everyday?
• The New York Stock Exchange generates about one
terabyte of new trade data
• Zyn...
Some BIG facts – What happens every minute?

Courtesy: http://practicalanalytics.files.wordpress.com
Big data – Objective

Effectively store, manage and analyze all
the data to create meaningful information
out of it
Big data – Sources
Big data – 3 V’s of Big data

Courtesy: bigdatablog.emc.com
Big data – 3 + 1 V’s of Big data

Courtesy: http://www.datasciencecentral.com/
Big data - Volume

Volumes are in:
• Terabytes
• Exabytes
• Petabytes
• Zetabytes

Courtesy: http://www.datasciencecentral...
Big data - Volume

Name

Value

1 GB
1 Terabyte (TB)

1024 GB

1 Petabyte (PB)

1,048,576 GB

1 Exabyte (EB)

1,073,741,82...
Big data - Velocity

• Live Stream
• Real time
• Batch

Courtesy: http://www.datasciencecentral.com/
Big data - Variety

• Structured (Tables)
• Unstructured (Tweets, SMSes)
• Semi-structured (Logfiles, RFID)

Courtesy: htt...
Big data - Veracity

• This kind of data is often
overlooked
• It is now considered as
important as 3 V’s of Big Data
• Ef...
Big data Technologies
Technologies & Solution providers:
• Storage (MS SqlServer, Apache Hadoop, Mongo DB)
• Processing (M...
Big data - Opportunities
•
•
•
•
•

Storage
Processing
Analytics
Integration
Solution
Big data – Major Players
Big data – Questions?
Big data – Thank you !!!
Welcome to big data
Upcoming SlideShare
Loading in …5
×

Welcome to big data

2,412 views

Published on

This presentation provides a good information about basics of Big Data.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,412
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
51
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  • Welcome to big data

    1. 1. Agenda • What is Big data? • Some BIG facts • Objective • Sources • 3 V’s of Big data • 3 + 1 V’s of Big data • Technologies • Opportunities • Major Players • Questions • Conclusion
    2. 2. What is Big data? Data Big Data
    3. 3. What is Big data? Data Big Data
    4. 4. Some BIG facts • 90% of the data in the world today has been created in the last two years alone • IDC Forecasting: The global universe of data will double every two years, reaching 40,000 exabytes or 40 trillion GB by 2020 • The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year. • Ancestry.com, the genealogy site, stores around 2.5 petabytes of data. • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.
    5. 5. Some BIG facts – What happens everyday? • The New York Stock Exchange generates about one terabyte of new trade data • Zynga processes 1 Petabyte of content • 30 billion pieces of content were added to Facebook • 2 billion videos are watched in Youtube • 2.5 quintillion bytes of data is created
    6. 6. Some BIG facts – What happens every minute? Courtesy: http://practicalanalytics.files.wordpress.com
    7. 7. Big data – Objective Effectively store, manage and analyze all the data to create meaningful information out of it
    8. 8. Big data – Sources
    9. 9. Big data – 3 V’s of Big data Courtesy: bigdatablog.emc.com
    10. 10. Big data – 3 + 1 V’s of Big data Courtesy: http://www.datasciencecentral.com/
    11. 11. Big data - Volume Volumes are in: • Terabytes • Exabytes • Petabytes • Zetabytes Courtesy: http://www.datasciencecentral.com/
    12. 12. Big data - Volume Name Value 1 GB 1 Terabyte (TB) 1024 GB 1 Petabyte (PB) 1,048,576 GB 1 Exabyte (EB) 1,073,741,824 GB 1 Zeta byte (ZB) 1,099,511,627,776 GB 1 Yottabyte (YB) Courtesy: http://www.datasciencecentral.com/ 1,073,741,824 bytes 1,125,899,906,842,624 GB
    13. 13. Big data - Velocity • Live Stream • Real time • Batch Courtesy: http://www.datasciencecentral.com/
    14. 14. Big data - Variety • Structured (Tables) • Unstructured (Tweets, SMSes) • Semi-structured (Logfiles, RFID) Courtesy: http://www.datasciencecentral.com/
    15. 15. Big data - Veracity • This kind of data is often overlooked • It is now considered as important as 3 V’s of Big Data • Effort to clean up data is rather not given importance • Poor data quality costs the U.S. economy around $3.1 trillions a year Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
    16. 16. Big data Technologies Technologies & Solution providers: • Storage (MS SqlServer, Apache Hadoop, Mongo DB) • Processing (MapReduce, Impala) • Analytics (SAS, R, Business Intelligence) • Integration (Flume, Sqoop)
    17. 17. Big data - Opportunities • • • • • Storage Processing Analytics Integration Solution
    18. 18. Big data – Major Players
    19. 19. Big data – Questions?
    20. 20. Big data – Thank you !!!

    ×