History of Big Data

A HISTORY OF BIG DATA
What is Big Data?
In essence, Big Data is a term for data sets that are so large or
complex that traditional data processing applications are
inadequate. It usually includes data sets with sizes beyond the
ability of commonly used software tools to capture, curate, manage
and process data within a tolerable elapsed time1
. The “size” of Big
Data is a constantly moving target, which doesn’t remain stable at
any given point of time. As per a recent report, its size ranges from
a few dozen terabytes to many petabytes of data.
The story of how data became big starts many years before the
current buzz around big data. About seventy years ago we
encountered the first attempts to quantify the growth rate in the
volume of data or what has popularly been known as the
Information Explosion (a term first used in 1941). The history of
Big Data as a term may be brief – but many of the foundations it is
built on were laid many years ago2
. Long before computers (as we
know today) were commonplace, the idea that we were creating an
ever-expanding body of knowledge ripe for analysis was popular
in academia.
Now, let’s look at a detailed account of the major milestones in the
history of sizing data volumes in the evolution of the idea of “big
data” and observations pertaining to data or information explosion:
1932 Skipping the important milestone of the population boom
would not do justice to the history of Big Data. Information
1
Source: Wikipedia
2
Link: https://www.linkedin.com/pulse/brief-history-big-data-everyone-should-read-bernard-marr

overload continued with the boom in the US population, the
issuing of social security numbers, and the general growth of
knowledge (research) which demanded more thorough and
organized record-keeping.
1941 Scholars began referring to this incredible expansion of
information as the “Information Explosion”. First referenced by
the Lawton Constitution (newspaper) in 1941, the term was
expanded upon in a New Statesman article in March 1964, which
referred to the difficulty of managing the amount of information
available.
1944 The first flag of warning on the growth of knowledge
storage and the retrieval problem came in 1944, when Fremont
Rider, a Wesleyan University Librarian estimated that American
university libraries were doubling in size every sixteen years. At
this growth rate, Rider speculated that the Yale Library in 2040
would have “approximately 200,000,000 volumes, which will
occupy over 6,000 miles of shelves… [requiring] a cataloging staff
of over six thousand persons.”
Schematic showing a general communication system3
.
3
Link: http://www.winshuttle.com/big-data-timeline/

1948 Claude Shannon published “Shannon’s Information
Theory” which established a framework for determining the
minimal data requirements to transmit information over a noisy
(imperfect) channel. This was a landmark work that enabled much
of today’s infrastructure. Without this understanding, data would
be “bigger” than it is today.
1956 The concept of virtual memory was developed by German
physicist Fritz-Rudolf Guntsch as an idea that treated finite storage
as infinite. Storage, managed by integrated hardware and software
to hide the details from the user, permitted us to process data
without the hardware memory constraints that previously forced
the problem to be partitioned.
Information Overload4
4
Image source: Google images

1961 Information Scientist, Derek Price, generalized Rider’s
findings to include almost the entire range of scientific knowledge.
The scientific revolution, as he called it, was responsible for the
rapid communication of new ideas as scientific information. This
rapid growth was in the form of new journals doubling every 15
years.
1963 In the early 1960’s, Price observed that the vast amount of
scientific research was too much for humans to keep abreast of.
Abstract journals, which were created in the late 1800’s as a way
to manage the increasing knowledge-base, were also growing at
the same trajectory and had already reached a “critical magnitude”.
They were no longer a storage or organization solution for
information.
1966 At around this time, the Centralized Computing Systems
entered the scene. Not only was information booming in the
science sector, it was booming in the business sector as well. Due
to the information influx in the 1960’s, most organizations began
to design, develop and implement centralized computing systems
that allowed them to automate their inventory systems.
1970 Edgar F. Codd, an Oxford-educated mathematician
working at the IBM Research Lab, published a paper showing how
information stored in large databases could be accessed without
knowing how the information was structures or where it resided on
the database. Until then, retrieving information required relatively
sophisticated computer knowledge, or even the services of
specialists —a time-consuming and expensive task. Today, most
routine data transactions—accessing bank accounts, using credit

cards, trading stocks, making travel reservations, buying things
online—all use structures based on relational database theory.
A relational database system5
1976 In the mid-1970’s, Materials Requirements Planning
(MRP) systems were designed as a tool to help manufacturing
firms to organize and schedule their information. Around the same
time, PC’s were gaining huge popularity gradually which marked a
shift in focus toward business processes and accounting
capabilities. Companies like Oracle and SAP were founded around
the same time.
5
Image source: IBM.com

1983 As advancements in technology continued further, every
industry began to benefit from new ways to organize, store and
produce data.
Information Explosion6
1996 Digital storage became more cost-effective for storing
data than paper. Also, the boom in data brought more challenges to
ERP vendors. The need to redesign ERP products, including
breaking the barrier of proprietorship and customization, forced
vendors to embrace the collaborative business over the internet in a
seamless manner.
1997 The term “Big Data” was used for the first time in an
article by NASA researchers Michael Cox and David Ellsworth.
6

The pair claimed that the rise of data was becoming an issue for
current computer systems. This was also known as the “problem of
big data”.
The 4 V’s of Big Data7
.
1998 By the end of 90’s, many businesses began to believe that
their data mining systems were not up to the mark and still needed
improvements. Business workers were unable to get access to or
answer the data they needed from searches. Also, IT resources
were not so easily available at their disposal. So, whenever the
employees needed access, they had to call the IT department due to
lack of easily accessible information.
2001 The acronym SaaS (Software as a Service) first appeared
around this time. It basically means an “on-demand software”
7

delivery model which is licensed on a subscription basis and is
centrally hosted.
Software as a Service8
2005 SaaS companies began appearing on the scene to offer an
alternative to Oracle and SAP that was more focused on the
usability of the end user. Adding to this was the creation of a new
programming language named Hadoop. Free to download, use,
enhance and improve, Hadoop is 100% open source way pf storing
and processing data that enables distributed parallel procession of
huge amounts of data across inexpensive, industry-standard servers
that both store and process the data with extreme scalability.
2009 Business Intelligence became a top priority for Chief
Information Officers in 2009. Tim Berners, director of the World
Wide Web Consortium (W3C) was the first to use the term “linked
8
Image source: Google images

data” during a presentation on the subject at the TED 2009
conference. A set of best practices for using the Web to create
links between structured data is known as Linked Data.
2011 By this time, nearly all sectors in the US economy had at
least an average of 200 terabytes of stored data per company with
more than 1000 employees. The writers also estimated the
securities and investment industries led in terms of stored data per
organization. The scientists calculated that 7.4 exabytes of original
data were saved by enterprises and 6.8 exabytes by consumers in
2010 alone.
2012 After the launch of IPv6, identification and location
system for computers on the networks and traffic routes across the
internet became much faster. Technologically advanced features
such as ability to generate reports from in-memory databases
which provide faster and more predictable performance were also
on the rise. Businesses began to implement new in-memory
technology such as SAP HANA to analyze and optimize mass
quantities of data. Companies became ever more reliant on
utilizing data as a business asset to gain a competitive advantage,
with big data leading the charge as arguably the most important
new technology to understand and make use of in day-to-day
business.
How does Hexanika make use of Big Data?
Hexanika is a FinTech big data software company which has
developed an end-to-end solution for financial institutions to
address data sourcing and reporting challenges for regulatory
compliance. Hexanika’s innovative solution improves data quality,
keeps regulatory reporting in harmony with the dynamic regulatory

requirements and keeps pace with the new developments and latest
regulatory updates.
Hexanika’s unique Big Data deployment approach by experienced
professionals will simplify, optimize and reduce costs of
deployment. It strives to achieve this by following the process as
shown below:
Hexanika addresses Big Data using its unique product and
solutions. To know more about us,
see: http://hexanika.com/company-profile/
Feel free to get in touch with our experts to know more
at: http://hexanika.com/contact-us-big-data-company/

CONTACT US
USA
249 East 48 Street,
New York, NY 10017
Tel: +1 646.733.6636
INDIA
Krupa Bungalow 1187/10,
Shivaji Nagar, Pune 411005
Tel: +91 9850686861
Email: info@hexanika.com
Follow Us

History of Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to History of Big Data

Similar to History of Big Data (20)

More from HEXANIKA

More from HEXANIKA (15)

Recently uploaded

Recently uploaded (20)

History of Big Data