Beyond The Hype
 Big

Data
 Big data—a growing torrent
 4 V’S Of Big Data
 Big Data vs. DWH-DM
 Challenges of Large Scale Social Netw...
Big data
 “Big data” refers to datasets whose size is beyond the ability
of typical database software tools to capture, s...
Big data—a growing torrent
 $600 to buy a disk drive that can store all of the world’s music

 5 billion mobile phones i...


Volume -- data is getting higher/bigger than ever.
Velocity -- data is increasing e.g. Complex real time data.
Variety ...
Big Data vs. DWH-DM
• Big Data
– Multitude of data types
• Structured, Semi-structured and Unstructured
– Demographic, psy...
Big Data vs. DWH-DM
• Areas like genomics, astronomy, military surveillance and
RFID technology are also contributing to t...
Challenges of Large Scale Social Network
Analysis
 Social networking sites like Facebook, YouTube, Orkut and
Twitter are ...
Challenges of Large Scale Social Network
Analysis
• Social Networks (SNs) are living networks that daily give birth
to dat...
Big data and Big Brother
• Perhaps one of the biggest contributors to big data, however,
is social networking.

• People t...
• While it may be difficult to manipulate big data at a grand
scale, it is relatively easy, given the right tools and tech...
Where does it come from??
 In the global marketplace, businesses, suppliers and customers are
creating and consuming vast...
Cont… Big Data
 Gartner predicts that enterprise data in all forms will grow
650% over the next 5 years.

 According to ...
NoSQL Databases
 Most of the organizations that built data platforms have
found it necessary to go beyond the relational ...
NoSQL Databases
Many of the NoSQL databases are the logical descendants of
Google’s BigTable and Amazon’s Dynamo.
These ...
Popular NoSQL databases
Cassandra:
 Developed at Facebook, in production use at Twitter,
Rackspace, Reddit, and other la...
Prevalence of Big Data
 Big data is not limited to big companies like Facebook and
Google.
 According to McKinsey Global...
Big Data Formats
Big data Technologies
 Big data technologies describe a new generation of
technologies and architectures, designed to eco...


MapReduce approach is basically a divide-and-conquer
strategy for distributing an extremely large problem across
an ext...
 MapReduce has proven to be widely applicable to many large
data problems, ranging from search to machine learning.

 Th...
Applications of Big data Analysis
 Facebook and LinkedIn use patterns of friendship
relationships to suggest other people...
Applications of Big data Analysis
 Facebook and LinkedIn use patterns of friendship
relationships to suggest other people...
 As

data volumes are growing exponentially, so is the
concern over data preservation, access,
dissemination, and usabili...
•

http://treparel.com/news/convergence-big-data-cloudcomputing/

•

http://www.intersystems.com/casestudies/cache/esa.h
t...
BigData
BigData
BigData
BigData
BigData
Upcoming SlideShare
Loading in...5
×

BigData

763

Published on

Basic Information On BigData

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
763
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "BigData"

  1. 1. Beyond The Hype
  2. 2.  Big Data  Big data—a growing torrent  4 V’S Of Big Data  Big Data vs. DWH-DM  Challenges of Large Scale Social Network Analysis  Where does it come from??  Big data Technologies  Applications of Big data Analysis  Conclusion
  3. 3. Big data  “Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.  This definition can vary by sector depending on what kinds of software tools are commonly available and what sizes of datasets are common there.  As technology advances over time, the size of datasets that qualify as Big data will also increase.  With these caveats, Big data will range from a few dozen terabytes to multiple petabytes (thousands of terabytes ).
  4. 4. Big data—a growing torrent  $600 to buy a disk drive that can store all of the world’s music  5 billion mobile phones in use in 2010 .  30 billion pieces of content shared on Facebook every month .  40% projected growth in global data generated per year vs. 5% growth in global IT spending .  235 terabytes data collected by the US Library of Congress by April 2011.  15 out of 17 sectors in the US have more data stored per company than the US Library of Congress.
  5. 5.  Volume -- data is getting higher/bigger than ever. Velocity -- data is increasing e.g. Complex real time data. Variety -- data is spiraling e.g. unstructured video & voice. Variability -- data types/formats also different Volume Variability Big Data Variety Velocity
  6. 6. Big Data vs. DWH-DM • Big Data – Multitude of data types • Structured, Semi-structured and Unstructured – Demographic, psychographic, transactional – Call center data, social media data, web log data, sensor networks etc. – Requires new storage mechanisms eg. Hadoop – High dimensionality – Online versions of algorithms • Online services such as eBay, Yahoo, Amazon and Facebook, have transformed/ created big data
  7. 7. Big Data vs. DWH-DM • Areas like genomics, astronomy, military surveillance and RFID technology are also contributing to the explosive growth of the field. • A jet engine’s sensors sends terabytes of data every hour, which can be used to build predictive models for repair cycles. Understanding when repairs should be done, instead of doing traditional preventive maintenance at certain set intervals, could be worth billions of dollars. • The challenge in big data analytics is to dig deeply, quickly and widely • DWH-DM – Structured data – Off-line algorithms
  8. 8. Challenges of Large Scale Social Network Analysis  Social networking sites like Facebook, YouTube, Orkut and Twitter are among the most popular sites on the internet.  Users of these sites form a social network (SN), which provides a powerful mean of sharing, organizing, and finding contents and contacts.  However, the rate at which SNs are growing, posses many latent challenges in maintaining the stability of their underlying systems and the members associated with them.
  9. 9. Challenges of Large Scale Social Network Analysis • Social Networks (SNs) are living networks that daily give birth to data traces which can be up to exabytes in volume. • For example, Facebook produce more than a petabyte of data per day. Even it’s logging data exceeds 25 terabytes per-day. • Google creates as much information (social blogs and orkut ) in two days now, as we did from the dawn of man through 2003 i.e., one exabyte of data. • Analysts need to analyze this huge plethora of SN data to support system management activities in limited time.
  10. 10. Big data and Big Brother • Perhaps one of the biggest contributors to big data, however, is social networking. • People themselves have become contributors of information as they increasingly use services such as Facebook and LinkedIn to connect with each other. • “LinkedIn is a particularly interesting target, given the professional nature of its audience. By analyzing LinkedIn network information, we can learn a lot about individuals and the people that they know”
  11. 11. • While it may be difficult to manipulate big data at a grand scale, it is relatively easy, given the right tools and techniques, to analyze small subsets (such as personal networks of contacts) for potentially useful results. • We can do this at a micro-analytic level, where we mine profiles for snippets of information and at the macro-analytic level, where we look at patterns in the data. • “Even when people are not part of your network, a properly filled-out profile reveals their job title, where they worked in the past, and where they were educated.”
  12. 12. Where does it come from??  In the global marketplace, businesses, suppliers and customers are creating and consuming vast amounts of information .
  13. 13. Cont… Big Data  Gartner predicts that enterprise data in all forms will grow 650% over the next 5 years.  According to IDC, the world's volume of data doubles every 18 months.  This flood of data is referred to as “information overload,” “data deluge” and “big data” .  Big data creates a challenge for business leaders.
  14. 14. NoSQL Databases  Most of the organizations that built data platforms have found it necessary to go beyond the relational database model to tackle big data, because they become ineffective at this scale.  Managing, sharding and replication across a horde of database servers is difficult and slow.  To store huge datasets effectively a new breed of databases are developed. There databases are called NoSQL databases, or Non-Relational databases.
  15. 15. NoSQL Databases Many of the NoSQL databases are the logical descendants of Google’s BigTable and Amazon’s Dynamo. These are designed to be distributed across many nodes, to provide consistency and to have very flexible schema.
  16. 16. Popular NoSQL databases Cassandra:  Developed at Facebook, in production use at Twitter, Rackspace, Reddit, and other large sites.  Cassandra is designed for high performance, reliability, and automatic replication. It has a very flexible data model. A new startup, Riptano, provides commercial support. HBase:  Part of the Apache Hadoop project, and modeled on Google’s BigTable.  Suitable for extremely large databases (billions of rows, millions of columns), distributed across thousands of nodes. Along with Hadoop, commercial support is provided by Cloudera.
  17. 17. Prevalence of Big Data  Big data is not limited to big companies like Facebook and Google.  According to McKinsey Global Institute study in 2011  Most of the investment firms in U.S with less than 1,000 employees has 3.8 petabytes of data stored.  Companies in all sectors have at least 100 terabytes stored.
  18. 18. Big Data Formats
  19. 19. Big data Technologies  Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery, and/or analysis.  The above definition incorporates all types of data (e.g., realtime, analytic) managed by next generation systems.
  20. 20.  MapReduce approach is basically a divide-and-conquer strategy for distributing an extremely large problem across an extremely large computing cluster.  In the “map” stage, a programming task is divided into a number of identical subtasks, which are then distributed across many processors.  The intermediate results are then combined by a single reduce task.  MapReduce provides a solution to Google’s biggest problem, i.e creating large searches.
  21. 21.  MapReduce has proven to be widely applicable to many large data problems, ranging from search to machine learning.  The most popular open source MapReduce is the Hadoop project. implementation of
  22. 22. Applications of Big data Analysis  Facebook and LinkedIn use patterns of friendship relationships to suggest other people you may know, or should know, with frightening accuracy.  Amazon saves your searches, correlates what you search for with what other users search for, and uses it to create surprisingly appropriate recommendations.  Medical researchers sift through the health records of thousands of people to try to identify useful correlations between medical treatments and health outcomes.
  23. 23. Applications of Big data Analysis  Facebook and LinkedIn use patterns of friendship relationships to suggest other people you may know, or should know, with frightening accuracy.  Amazon saves your searches, correlates what you search for with what other users search for, and uses it to create surprisingly appropriate recommendations.  Medical researchers sift through the health records of thousands of people to try to identify useful correlations between medical treatments and health outcomes.
  24. 24.  As data volumes are growing exponentially, so is the concern over data preservation, access, dissemination, and usability. Many agencies has taken initiatives to research into areas such as automated analysis techniques, data mining, machine learning, privacy, and database interoperability and these will help to identify how big data can enable science in new ways and at new levels..
  25. 25. • http://treparel.com/news/convergence-big-data-cloudcomputing/ • http://www.intersystems.com/casestudies/cache/esa.h tml • http://blogs.technet.com/b/trustworthycomputing/arc hive/2013/06/04/cloud-computing-turning-big-datainto-business-insight.aspx

×