What is Big Data?
Big data is basically a term for large data-sets, these data sets are so complex and large
in size that it becomes difficult to store, access or process in traditional database
applications or tools. It exceeds the processing capacity of conventional database
systems. Big data is too big (petabytes or exabytes), it moves too fast, or it doesn’t fit
structures of database architectures. The data is typically loosely structured data that is
often incomplete and inaccessible.
Specifically, Big Data is related to data creation, retrieval, manipulation and analysis of
data that is exceptional in terms of volume, velocity and variety: -
1. Volume – Facebook consumes more than 500 TB of data in one day. Google
receives 2 million search queries per minute. 40 terabytes of data is generated
every second from nuclear physics experiments at the Large Hadron Collider at
This volume presents the most immediate challenge to traditional IT structures. It
demands scalable storage, and a distributed approach to querying.
2. Velocity – It represents the frequency of data processing or data generation.
Many MNCs and organizations have capturing click streams of data from
websites (Google, Yahoo, Facebook, Microsoft, etc.), using that streaming data
these corporations make purchase recommendations in form of ads to web
visitors. Streaming data also have to make sense to analysis that goes with it, at
the same time it also have to produce results and take actions – all in real time.
3. Variety – Big data is not just in form of strings or numbers. It also includes 3d
data, audio, video, pictures, log files, GPS data, etc. Conventional databases were
designed to address smaller volumes of structured data and predictable and
consistent data structures. With increasing number of users, traditional RDBMS
has become liability for organizations, making it harder to serve their users.
Every enterprise needs to understand Big data, and how it affects them. Standard
tools and procedures are not designed to analyze and search massive datasets.
Big Data requires exceptional technology to efficiently process large amount of data in
acceptable amount of time lapse. Technologies like massively parallel processing
databases, search-based applications, data mining grids, distributed file systems and
databases, cloud based infrastructure are suitable.
Big Data Softwares:-
1. Hadoop - Apache Foundation.
It is an open source software project that enables the distributed processing of
large data sets across clusters of commodity servers. Hadoop makes it possible to
run applications on systems with thousands of nodes involving thousands of
terabytes. Rather than relying on high-end hardware, the resiliency of these clusters
comes from the software’s ability to detect and handle failures at the application
The Hadoop framework is used by major players including Google, Yahoo and IBM,
largely for applications involving search engines and advertising. The preferred
operating systems are Windows and Linux but Hadoop can also work with BSD and
2. MongoDB - MongoDB, Inc.
It is a document-oriented database system classified as NoSQL* database. MySQL
is written using SQL queries, while MongoDB is focused on BSON (Binary
It is a handy tool for smaller database requirements. MongoDB supports complex
operations like join, indexing much easily and efficiently as compared to
*A NoSQL or Not Only SQL database provides a mechanism for storage and
retrieval of data that is modeled in means other than the tabular relations used in
**JSON an open standard format that uses human-readable text to transmit data
objects consisting of attribute–value pairs.
3. Splunk - Splunk Inc.
Splunk is an advanced IT search tool that offers users, administrators, and
developers the ability to instantly search all data generated by applications, servers,
and network devices in the IT infrastructure. It generates reports, graphs, alerts and
visualizations from the data which it captures and correlates in a repository. Splunk
turns machine data into valuable insights no matter what business you're in.