.Netter Tech Summit- 2015
RUET
Md. Delwar Hossain
Sr. Software Engineer, desme Bangladsh.
Skype: delwar_databiz
E-mail: twinkle_023020@yahoo.com
Linkedin: delwar-hossain
Unit Name Symbal Size
Kilobyte KB 10^3
Megabyte MB 10^6
Gigabyte GB 10^9
Terabyte TB 10^12
Petabyte PB 10^15
Exabyte EB 10^18
Zettabyte ZB 10^21
Yottabyte YB 10^24
 Depending on traditional system
 Traditional Software Tools
100 MB Document 100 GB Image 100 TB Video
Unable to
Send
Unable to
View
Unable to
Edit
Data come from many quarters.
 Social media sites
 Sensors
 Business transactions
 Location-based
 Volume: large volumes of data
 Velocity: quickly moving data
 Variety: structure, unstructured, images, audios, videos etc.
 Business value outcomes tied to
the business strategy resulting
from business decisions.
 Higher productivity, faster time to
complete tasks
 Lower total cost of ownership and
greater efficiencies in IT
 Operational
 NoSQL
 Analytical
 Hadoop
 Scalability  Semi-structured and unstructured
Data.
 High Velocity
 Schema less : data structure is not predefined
 Focus on retrieval of data and appending new data
 Focus on key-value data stores that can be used to locate data
objects
 Focus on supporting storage of large quantities of unstructured
data
 SQL is not used for storage or retrieval of data
 No ACID (atomicity, consistency, isolation, durability)
 Built for cloud
 Scale out architecture
 Agility afforded by cloud
computing
 Sharding automatically
distributes data evenly across
multi-node clusters
 Automatically manages
redundant servers. (replica sets)
Horizontal scalality
Application
Document Oriented
{
author:”delwar”,
date:new Date(),
Topics:’Big Data Concept’,
tag:[“tools”,”database”]
}
Visit: www.mongodb.org
Big Data

Big Data

  • 1.
  • 2.
    Md. Delwar Hossain Sr.Software Engineer, desme Bangladsh. Skype: delwar_databiz E-mail: twinkle_023020@yahoo.com Linkedin: delwar-hossain
  • 4.
    Unit Name SymbalSize Kilobyte KB 10^3 Megabyte MB 10^6 Gigabyte GB 10^9 Terabyte TB 10^12 Petabyte PB 10^15 Exabyte EB 10^18 Zettabyte ZB 10^21 Yottabyte YB 10^24
  • 5.
     Depending ontraditional system  Traditional Software Tools 100 MB Document 100 GB Image 100 TB Video Unable to Send Unable to View Unable to Edit
  • 6.
    Data come frommany quarters.  Social media sites  Sensors  Business transactions  Location-based
  • 7.
     Volume: largevolumes of data  Velocity: quickly moving data  Variety: structure, unstructured, images, audios, videos etc.
  • 8.
     Business valueoutcomes tied to the business strategy resulting from business decisions.  Higher productivity, faster time to complete tasks  Lower total cost of ownership and greater efficiencies in IT
  • 9.
     Operational  NoSQL Analytical  Hadoop
  • 10.
     Scalability Semi-structured and unstructured Data.  High Velocity
  • 11.
     Schema less: data structure is not predefined  Focus on retrieval of data and appending new data  Focus on key-value data stores that can be used to locate data objects  Focus on supporting storage of large quantities of unstructured data  SQL is not used for storage or retrieval of data  No ACID (atomicity, consistency, isolation, durability)
  • 12.
     Built forcloud  Scale out architecture  Agility afforded by cloud computing  Sharding automatically distributes data evenly across multi-node clusters  Automatically manages redundant servers. (replica sets) Horizontal scalality Application Document Oriented { author:”delwar”, date:new Date(), Topics:’Big Data Concept’, tag:[“tools”,”database”] }
  • 13.

Editor's Notes

  • #8 Volume. A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will generate 240 terabytes of flight data during a single flight across the US; the proliferation of smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information Velocity. Clickstreams and ad impressions capture user behavior at millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds; machine to machine processes exchange data between billions of devices; infrastructure and sensors generate massive log data in real-time; on-line gaming systems support millions of concurrent users, each producing multiple inputs per second. Variety. Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media. 
  • #10 Selecting a Big Data Technology: Operational vs. Analytical The Big Data landscape is dominated by two classes of technology: systems that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored; and systems that provide analytical capabilities for retrospective, complex analysis that may touch most or all of the data. These classes of technology are complementary and frequently deployed together. Operational and analytical workloads for Big Data present opposing requirements and systems have evolved to address their particular demands separately and in very different ways. Each has driven the creation of new technology architectures. Operational systems, such as theNoSQL databases, focus on servicing highly concurrent requests while exhibiting low latency for responses operating on highly selective access criteria. Analytical systems, on the other hand, tend to focus on high throughput; queries can be very complex and touch most if not all of the data in the system at any time. Both systems tend to operate over many servers operating in a cluster, managing tens or hundreds of terabytes of data across billions of records. Operational Big Data For operational Big Data workloads, NoSQL Big Data systems such as document databases have emerged to address a broad set of applications, and other architectures, such as key-value stores, column family stores, and graph databases are optimized for more specific applications. NoSQL technologies, which were developed to address the shortcomings of relational databases in the modern computing environment, are faster and scale much more quickly and inexpensively than relational databases. Critically, NoSQL Big Data systems are designed to take advantage of new cloud computing architectures that have emerged over the past decade to allow massive computations to be run inexpensively and efficiently. This makes operational Big Data workloads much easier to manage, and cheaper and faster to implement. In addition to user interactions with data, most operational systems need to provide some degree of real-time intelligence about the active data in the system. For example in a multi-user game or financial application, aggregates for user activities or instrument performance are displayed to users to inform their next actions. Some NoSQL systems can provide insights into patterns and trends based on real-time data with minimal coding and without the need for data scientists and additional infrastructure. Analytical Big Data Analytical Big Data workloads, on the other hand, tend to be addressed by MPP database systems and MapReduce. These technologies are also a reaction to the limitations of traditional relational databases and their lack of ability to scale beyond the resources of a single server. Furthermore, MapReduce provides a new method of analyzing data that is complementary to the capabilities provided by SQL. As applications gain traction and their users generate increasing volumes of data, there are a number of retrospective analytical workloads that provide real value to the business. Where these workloads involve algorithms that are more sophisticated than simple aggregation, MapReduce has emerged as the first choice for Big Data analytics. Some NoSQL systems provide native MapReduce functionality that allows for analytics to be performed on operational data in place. Alternately, data can be copied from NoSQL systems into analytical systems such as Hadoop for MapReduce.