SlideShare a Scribd company logo
1 of 23
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 1
INDEX
- OVERVIEW
- INTRODUCTION
- BIG DATA ANALYSIS PIPELINE
-Data Acquisition and Recording
-Information Extraction and Cleaning
-Data Integration, Aggregation, and Representation
-Query Processing, Data Modelling, and Analysis
-Interpretation
- FIELDS OF RELEVANCE
- TOOLS AND TECHNIQUES IN DTA ANALYTICS AN OVERVIEW
- a / b testing
- Crowdsourcing
- Machine learning
- CASE STUDIES
- Shoppers’ Stop
- Air BnB
- Indian Elections
- 15 Upcoming BIG DATA startups
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 2
Overview
“Information is the oil and data analytics is the combustion engine”- Peter Sondergaard
Sine the invention of the World Wide Web information has been mace accessible at a
minute level, this has further created a lot of unstructured data. Technology and mathematical
pioneers predicted the boom in information even the modern times. The scenarios that led to
such a conclusion included the ever expanding information in the field of science, maintenance
of census details, huge amount of journals and publications that had to be stored seemed to be
a tedious task. The rate at which information in expanding today thanks to the connectivity of
the society by the internet and mobile phones.
This report is a glance into the tools and techniques adopted by organizations today in
achieving the optimized solution for various problems, functions to be incorporated into
products, making a value added proposition for each decision taken. The initial effort was to
understand the functional aspects of big data, the report then ventures into the upcoming tools
as an overall and further focuses on three major concepts among them.
The study then analyses three cases of different magnitude, scope and location to
understand the application level of big data at a fundamental and practical level. The study has
focussed on two Indian cases and an international case.
The study also aims at understanding upcoming trends in the analytics environment by
providing top 15 startups solely based on Data Sciences.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 3
2) Introduction:
The term “Big Data,” which spans computer science and statistics/econometrics,
probably originated in lunch-table conversations at Silicon Graphics Inc. (SGI) in the mid
1990s, in which John Mashey figured prominently. The first significant academic references
are arguably Weiss and Indurkhya (1998) in computer science and Diebold (2000) in
statistics/econometrics. An unpublished 2001 research note by Douglas Laney at Gartner
enriched the concept significantly. Hence the term “Big Data” appears reasonably attributed to
Massey, Weiss and Indurkhya, Diebold, and Laney. Big Data the phenomenon continues
unabated, and Big Data the discipline is emerging.
Recent technological advances and novel applications, such as sensors, cyber-physical
systems, smart mobile devices, cloud systems, data analytics, and social networks, are making
possible to capture, process, and share huge amounts of data – referred to as big data - and to
extract useful knowledge, such as patterns, from this data and predict trends and events. Big
data is making possible tasks that before were impossible, like preventing disease spreading
and crime, personalizing healthcare, quickly identifying business opportunities
2.1) Definition:
Big data usually includes data sets with sizes beyond the ability of commonly used
software tools to capture, curate, manage, and process data within a tolerable elapsed time.
Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes
to many petabytes of data. Big data is a set of techniques and technologies that require new
forms of integration to uncover large hidden values from large datasets that are diverse,
complex, and of a massive scale.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 4
Big data can be described by the following characteristics:
Volume – The quantity of data that is generated is very important in this context. It is the size
of the data which determines the value and potential of the data under consideration and
whether it can actually be considered as Big Data or not. The name ‘Big Data’ itself contains
a term which is related to size and hence the characteristic.
Variety - The next aspect of Big Data is its variety. This means that the category to which Big
Data belongs to is also a very essential fact that needs to be known by the data analysts. This
helps the people, who are closely analyzing the data and are associated with it, to effectively
use the data to their advantage and thus upholding the importance of the Big Data.
Velocity - The term ‘velocity’ in the context refers to the speed of generation of data or how
fast the data is generated and processed to meet the demands and the challenges which lie
ahead in the path of growth and development.
Variability - This is a factor which can be a problem for those who analyze the data. This refers
to the inconsistency which can be shown by the data at times, thus hampering the process of
being able to handle and manage the data effectively.
Complexity - Data management can become a very complex process, especially when large
volumes of data come from multiple sources. These data need to be linked, connected and
correlated in order to be able to grasp the
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 5
3) Big Data analysis pipeline:
3.1) Phases in the Processing Pipeline:
3.1.1) Data Acquisition and Recording
Big Data does not arise out of a vacuum: it is recorded from some data generating
source. For example, consider our ability to sense and observe the world around us, from the
heart rate of an elderly citizen, and presence of toxins in the air we breathe, to the planned
square kilometre array telescope, which will produce up to 1 million terabytes of raw data per
day. Similarly, scientific experiments and simulations can easily produce petabytes of data
today.
Much of this data is of no interest, and it can be filtered and compressed by orders of
magnitude. One challenge is to define these filters in such a way that they do not discard useful
information. We need research in the science of data reduction that can intelligently process
this raw data to a size that its users can handle while not missing the needle in the haystack.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 6
Furthermore, we require “on-line” analysis techniques that can process such streaming data on
the fly, since we cannot afford to store first and reduce afterward.
The second big challenge is to automatically generate the right metadata to describe
what data is recorded and how it is recorded and measured. For example, in scientific
experiments, considerable detail regarding specific experimental conditions and procedures.
3.1.2) Information Extraction and Cleaning
Frequently, the information collected will not be in a format ready for analysis.
For example, consider the collection of electronic health records in a hospital, comprising
transcribed dictations from several physicians, structured data from sensors and measurements
(possibly with some associated uncertainty), and image data such as x-rays. We cannot leave
the data in this form and still effectively analyse it. Rather we require an information extraction
process that pulls out the required information from the underlying sources and expresses it in
a structured form suitable for analysis.
3.1.3) Data Integration, Aggregation, and Representation
Data can be very heterogeneous and may have different metadata. Data integration,
even in more conventional cases, requires huge human efforts. Novel approaches that can
improve the automation of data integration are critical as manual approaches will not scale to
what is required for big data. Also different data aggregation and representation strategies may
be needed for different data analysis tasks
Even for simpler analyses that depend on only one data set, there remains an
important question of suitable database design. Usually, there will be many alternative ways
in which to store the same information. Certain designs will have advantages over others for
certain purposes, and possibly drawbacks for other purposes.
3.1.4) Query Processing, Data Modelling, and Analysis
Methods suitable for big data need to be able to deal with noisy, dynamic,
heterogeneous, untrustworthy data and data characterized by complex relations. However
despite these difficulties, big data even if noisy and uncertain can be more valuable for
identifying more reliable hidden patterns and knowledge compared to tiny samples of good
data. Also the (often redundant) relationships existing among data can represent an opportunity
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 7
for cross-checking data and thus improve data trustworthiness. Supporting query processing
and data analysis requires scalable mining algorithms and powerful computing infrastructures.
A problem with current Big Data analysis is the lack of coordination between
database systems, which host the data and provide SQL querying, with analytics packages that
perform various forms of non-SQL processing, such as data mining and statistical analyses.
Today’s analysts are impeded by a tedious process of exporting data from the database,
performing a non-SQL process and bringing the data back.
3.1.5) Interpretation
Having the ability to analyze Big Data is of limited value if users cannot understand
the analysis. Ultimately, a decision-maker, provided with the result of analysis, has to interpret
these results. This is rarely enough to provide just the results. Rather, one must provide
supplementary information that explains how each result was derived, and based upon
precisely what inputs. Such supplementary information is called the provenance of the (result)
data.
Visualizations become important in conveying to the users the results of the queries in
a way that is best understood in the particular domain. Whereas early business intelligence
systems’ users were content with tabular presentations, today’s analysts need to pack and
present results in powerful visualizations that assist interpretation, and support user
collaboration. Furthermore, with a few clicks the user should be able to drill down into each
piece of data that she sees and understand its provenance, which is a key feature to
understanding the data. That is, users need to be able to see not just the results, but also
understand why they are seeing those results.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 8
4) Fields of relevance
Big data is relevant for all components of our society. Industry is using big data
for shifting business intelligence from reporting and decision support to prediction and next-
move decisions. This use of big data emphasizes that big data is critical for obtaining actionable
knowledge. Governments are also interested in using big data and predictive analytics to
improve decision making and transparency, to engage citizens in public affairs, to improve
national security. Healthcare represents another major area to which big data may offer novel
opportunities. Learning health systems are currently focusing on turning health care data into
knowledge, translating that knowledge into practice, and creating new data by means of
advanced information technology. As pointed out in, the use of big data technologies can
reduce the cost of healthcare while improving its quality by making care more preventive and
personalized and basing it on more extensive (home-based) continuous monitoring.
Big data is also crucial for research. Many areas of science and engineering are
currently facing from a hundred to a thousand-fold increase in the volume of data generated
compared to only one decade ago. This data is produced by many sources including
simulations, high-throughput scientific instruments, satellites, and telescopes. While the
availability of big data is revolutionizing how research is conducted and is leading to the
emergence of a new paradigm of science based on data-intensive computing, at the same time
it poses a significant challenge for scientists. In order to be able to leverage these huge volumes
of data, new techniques and technologies are needed. A new type of e-infrastructure, the
Research Data Infrastructure, must be designed, implemented and optimized to support the full
life cycle of scientific data, its movement across scientific disciplines, and its integration with
published literature.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 9
Fig 2. Infographic showing current and developing state of big data.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 10
5) Tools and techniques an overview
The main focus on this report is to study the various tools, techniques and
technologies that have been adopted by organizations around the world to reduce this
monumental size of data from its unstructured state to a quantifiable, structured and retrievable
state. The techniques provide a fundamental insight into the basics of retrieving useful data
from big databases or big data as its referred.
Tools and Techniques
 A/B testing
 Crowdsourcing
 Data fusion and integration
 Genetic algorithms
 Machine learning
 Natural language processing
 Signal processing, simulation
 Time series analysis
 Visualization
 Data mining
 Association rule learning
 Classification tree analysis
 Regression analysis
 Sentiment analysis
 Social network analysis
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 11
4.1) a/b testing
It is a form of statistical hypothesis testing with two variants leading to the
technical term, Two-sample hypothesis testing, used in the field of statistics. Other terms used
for this method include bucket tests and split testing but these terms have a wider applicability
to more than two variants.
A/B testing, also known as split testing, is a method of testing through which
marketing variables are compared to each other to identify the one that brings a better response
rate. In this context, the element that is being testing is called “control” and the element that is
argued to give a better result is called “treatment.” Running A/B tests in your marketing
initiatives is a great way to learn how to drive more traffic to your website and generate more
leads from the visits you’re getting. Just a few small tweaks to a landing page, email or call-
to-action can significantly affect the number of leads your company attracts. The insights
stemming from split tests can drastically improve the conversion rates of your landing pages
and the clickthrough rates of your website calls-to-action and email campaigns. In fact, A/B
testing of landing pages can generate up to 30-40% more leads for B2B sites and 20-25% more
leads for eCommerce sites.
Fig 3. Analysis of variation in A/B testing
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 12
The statistical aspects behind a/b testing
Factor - A controllable experimental variable that is thought to influence the OEC. Factors are
assigned Values, sometimes called Levels or Versions. Factors are sometimes called Variables.
In simple A/B tests, there is a single factor with two values: A and B.
Variant - A user experience being tested by assigning levels to the factors; it is either the
Control or one of the Treatments. Sometimes referred to as Treatment, although we prefer to
specifically differentiate between the Control, which is a special variant that designates the
existing version being compared against and the new Treatments being tried. In case of a bug,
for example, the experiment is aborted and all users should see the Control variant.
Experimentation Unit - The entity on which observations are made. Sometimes called an item.
The units are assumed to be independent. On the web, the user is the most common
experimentation unit, although some experiments may be done on sessions or page views. For
the rest of the paper, we will assume that the experimentation unit is a user. It is important that
the user receive a consistent experience throughout the experiment, and this is commonly
achieved through cookies.
Null Hypothesis - The hypothesis, often referred to as H0, that the OECs(overll evaluation
criteria) for the variants are not different and that any observed differences during the
experiment are due to random fluctuations.
Confidence level - The probability of failing to reject (i.e., retaining) the null hypothesis when
it is true. Power. The probability of correctly rejecting the null hypothesis, H0 , when it is false.
Power measures our ability to detect a difference when it indeed exists.
A/A Test - Sometimes called a Null Test . Instead of an A/B test, you exercise the
experimentation system, assigning users to one of two groups, but expose them to exactly the
same experience. An A/A test can be used to (i) collect data and assess its variability for power
calculations, and (ii) test the experimentation system (the Null hypothesis should be rejected
about 5% of the time when a 95% confidence level is used).
Standard Deviation (Std-Dev) - A measure of variability, typically denoted by 𝜎.
Standard Error (Std-Err) - For a statistic, it is the standard deviation of the sampling
distribution of the sample statistic . For a mean of 𝑛 independent observations, it is 𝜎 / 𝑛 where
𝜎 is the estimated standard deviation.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 13
On an e-commerce website you can test:
 Content: headlines, texts, product descriptions, testimonials, etc. I strongly believe that words
can make a huge difference for any business. Communicating the right message, in the right
way, to the right audience will boost conversions instantly.
 Images and video. Sometimes, people tend to skip the lines and simply look at the pictures on
your website. Always display high quality pictures, related to the topics on site.
 Call-to-action buttons.
 Design. I include here: fonts, colours, position of elements on page, etc. The whole website
must have one design to match the brand’s identity. Use matching colours and always check
the meaning of every colour because it has a huge impact on visitors.
Benefits of a/b testing
AB Testing comes in handy because it lowers risks when it comes to important decisions in
the company. Doing AB Testing constantly, will point out what to do and not to do on your
website, and you will know what decision to make.
With AB Testing, failure is not an option. I say this because you have nothing to lose in an AB
Testing experiment. Even if the test hasn’t reached a statistical relevance or if the results are
not how you expected, there’s no financial loss involved.
It is cheaper to use AB testing that to directly modify your website. In case you decide to
modify your website, without testing it first, you invest lots of money and time in programming
and design. And nothing could tell you if the money you spend will get back to you as profit.
But, if you test the variations and you realize it’s not worthy to make those changes, you save
time and money.
Some online tools that help in achieving a/b testing are Google analytics content experiment,
Optimizely, Unbounce, Wingify, Genetify, Five second test etc.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 14
4.2) Crowdsourcing
Crowdsourcing represents the act of a company or institution taking a function once
performed by employees and outsourcing it to an undefined (and generally large) network of
people in the form of an open call. This can take the form of peer-production (when the job is
performed collaboratively), but is also often undertaken by sole individuals. The crucial
prerequisite is the use of the open call format and the large network of potential labourers.
Fig 4. Systematic data flow in crowdsourcing analytics
From avoiding traffic jams, to analysing pedestrian flow patterns, to finding the best
public toilet in town, crowdsourcing apps are showing that many smartphones make for light
work.
With thousands of mini-reports coming in from around the internet, a mosaic of
information can form a larger picture that can be used for many different purposes, from
meteorology to car-sharing.
Using the intelligence of a vast interconnected organism, however, is nothing new: the
venerable Oxford English Dictionary may in fact be the earliest example of crowdsourcing. In
the mid-19th century it made an open call for volunteers to log words and provide examples of
their usage. Over a 70-year period, it received more than six million submissions. Today,
crowdsourcing is used in investing, in creative work and in funding start up projects.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 15
Crowdsourcing sites like represent just one more type of network that will connect
people with products and technology, telling you what products they used, what they thought
of them, and what reviews they read, liked or shared. If you then link the crowdsourcing
network to a business network like LinkedIn, you can connect companies to reviewers and
bring with it lots of context that when comprehensively analysed can transform your
understanding of the reviews:
 Analyze the reviews for opinion. What companies are using what products, what they think of
them, and why?
 Analyze the interactions for need and intent. If someone read lots of reviews about CRM
systems, there is a reasonable chance they may need a CRM system. If they then share or
recommend a particular review, that may indicate intent to buy or intent to investigate further.
It may also indicate an attempt to sell, however this is easy to catch from the business network.
 Analyze the business network for context. Company name, maybe industry, products, location,
social network (which itself could be analyzable for further context – what they are talking
about, who they are associated with, etc.)
All this analysis lays down more data about the data, and allows you to model the
products in a whole new way inferring new characteristics from the organizations providing
the feedback. You can then provide analysis that is personalized around common attributes
between organizations and provide a much deeper drill-down based on a broader set of
harvested features.
This type of analysis goes way beyond what a traditional analyst can possibly
achieve, although it clearly brings with it the risk of introducing noise. So, while the analyst is
not going anywhere anytime soon (and the great ones will always be in demand), these new
approaches of gathering and analyzing data will challenge the status quo. To what outcome,
only time will tell.
Some common crowdsource platforms are Quora, Frilp, Indiegogo,Kickstarter
(crowdfunding) etc
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 16
4.3) Machine learning
Machine learning is a scientific discipline that explores the construction and study
of algorithms that can learn from data Such algorithms operate by building a model from
example inputs and using that to make predictions or decisions, rather than following strictly
static program instructions. Machine learning is closely related to and often overlaps
with computational statistics; a discipline which also specializes in prediction-making.
Machine learning is a subfield of computer science stemming from research
into artificial intelligence. It has strong ties to statistics and mathematical optimization, which
deliver methods, theory and application domains to the field. Machine learning is employed in
a range of computing tasks where designing and programming explicit, rule-based algorithms
infeasible. Example applications include spam filtering, optical character
recognition(OCR), search engines and computer vision. Machine learning is sometimes
conflated with data mining, although that focuses more on exploratory data analysis. Machine
learning and pattern recognition "can be viewed as two facets of the same field."
When employed in industrial contexts, machine learning methods may be referred
to as predictive analytics or predictive modelling.
Types of machine learning
Machine learning is usually divided into two main types.
In the predictive or supervised learning approach, the goal is to learn a mapping from
inputs x to outputs y, given a labeled set of input-output pairs D = {(xi, yi)} N i=1. Here D is
called the training set, and N is the number of training examples. In the simplest setting, each
training input xi is a D-dimensional vector of numbers, representing, say, the height and weight
of a person. These are called features, attributes or covariates. In general, however, xi could be
a complex structured object, such as an image, a sentence, an email message, a time series, a
molecular shape, a graph, etc. Similarly the form of the output or response variable can in
principle be anything, but most methods assume that yi is a categorical or nominal variable
from some finite set, yi ∈ {1,...,C} (such as male or female), or that yi is a real-valued scalar
(such as income level). When yi is categorical, the problem is known as classification or pattern
recognition, and when yi is real-valued, the problem is known as regression.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 17
The second main type of machine learning is the descriptive or unsupervised learning
approach. Here we are only given inputs, D = {xi}N i=1, and the goal is to find “interesting
patterns” in the data. This is sometimes called knowledge discovery. This is a much less well-
defined problem, since we are not told what kinds of patterns to look for, and there is no
obvious error metric to use (unlike supervised learning, where we can compare our prediction
of y for a given x to the observed value).
Fig 4. Comparison of 2 types of machine learning.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 18
5) CASE STUDIES
5.1)Case study on Indian retail giant Shoppers Stop
Three years ago when Shoppers Stop Ltd started its Big Data analytics
programme, little did it know that Big Data would lead to big gains .In one of its earliest
analytics programmes, the company studied the buying patterns of members of its loyalty
programme, First Citizen.Based on the insights, it developed targeted promotions for trousers
.This led to around 10 crore worth of additional sales in a three week period for Shoppers Stop.
After analysing its First Citizen base, the company had observed that not all those who buy
shirts also buy trousers. But those who buy both men’s shirts and trousers spend 60% more a
year on average than those who buy only shirts, and thrice as much as those who don’t buy
men’s shirts at all, said Vinay Bhatia, vicepresident, marketing and loyalty, Shoppers Stop.
It then shortlisted over 900,000 people for a “targeted trouser communication”.
According to Bhatia, the 900,000 were further divided into three groups of target customers.
The first group included customers who showed a pattern of being interested in new brands in
other non trouser categories. They were sent information on new trouser brand launches and
fits. The second group included those who exhibited multiple buying patterns in other
categories. They were sent attractive deals if they bought two or more trousers.
Finally, the third was a “control group” to measure success or failure of the promotions.
“This (control group) is a practice that we do for all our analytics insights,” added Bhatia, “the
targeted communication exercise led to a lift of 30% in sales (about `10 crore) when compared
with the response received from the control group. Big Data analytics is now a crucial part of
the company’s strategy.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 19
5.2) Case study on Airbnb
Airbnb is an incredible success story. In just a few years, the company has become a
powerhouse in the travel industry, providing travelers with an alternative to hotels, and
providing individuals who have rooms, apartments or homes to rent with a new source of
income. In 2012, travelers booked over 5 million nights with Airbnb’s service. But it started
small, and its founders—adherents to the Lean Startup mindset—took a very methodical
approach to their success. Joe Zadeh, Product Lead at Airbnb, shared part of the company’s
amazing story. He focused on one aspect of their business: professional photography. It started
with a hypothesis: “Hosts with professional photography will get more business. And hosts
will sign up for professional photography as a service.” This is where the founders’ gut instincts
came in: they had a sense that professional photography would help their business. But rather
than implementing it outright, they built a Concierge Minimum Viable Product (MVP) to
quickly test their hypothesis. Initial tests of their MVP showed that professionally
photographed listings got two to three times more bookings than the market average. This
validated their first hypothesis. And it turned out that hosts were wildly enthusiastic to receive
an offer from Airbnb to take those photographs for them. In mid-to-late 2011, Airbnb had 20
photographers in the field taking pictures for hosts—roughly the same time period where we
see the proverbial “hockey stick” of growth in terms of nights booked
Summary:
• Airbnb’s team had a hunch that better photos would increase rentals.
• They tested the idea with a Concierge MVP, putting the least effort possible into a test that
would give them valid results.
• When the experiment showed good results, they built the necessary components and rolled
it out to all customers.
Analytics Lessons Learned:
Sometimes, growth comes from an aspect of your business you don’t expect. When
you think you’ve found a worthwhile idea, decide how to test it quickly, with minimal
investment. Define what success looks like beforehand, and know what you’re going to do if
your hunch is right..
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 20
5.3) Case study on Indian Elections – The first Prime Minister to use BIG DATA
Modi’s use of big data so impressive is that it was both relatively new to Indian
politics, and wrought with unique challenges. Take, for example, the size of the Indian
electorate. With 814 million voters, in comparison to the USA’s 193.6 million and the UK’s
45.5 million, the sheer volume of data of India’s voting population was perhaps the largest
obstacle. The second was the variety of data – India’s voter rolls in 12 different languages and
900,000 PDF’s amounting to 25 million pages made for a heterogeneous, non-uniform and
deeply diverse information set. Finally, the veracity of the information was often questionable
– one report noted that some voters were listed as 19,545 years old, and others a confounding
0 years old. Name overlapping (there are 327,000 women named “Sita” in Bihar alone) only
further complicated the process.
Despite these challenges, the rewards – as Modi has clearly demonstrated while
employing this data to “drive donations, enroll volunteers, and improve the effectiveness of
everything from door knocks…to social media” – are significant. BJP’s website, for
example, planted cookies on all computers that visited its site, and then used information
about these users’ further internet activity – i.e., the sites they visited after BJP’s – for
customised advertisements:
“If you move out of the BJP website and visit a website for bikes followed by a
search on jobs, the algorithm will make the inference that you are a young male from a
particular constituency, say Delhi, who is currently on a job hunt. What happens next is when
you visit a job searching portal like Naukri.com, this system pops up a contextual ad for you
like ‘jobs in Delhi’. The BJP banner which is just below the results will tell you ‘There are
no Jobs in Delhi. India deserves better’.” – source
Tactics like these — both online and offline analytics and marketing — were the
backbone to Modi’s success. He lead the charge with both social media and the analysis of
publicly available data. Whereas Indian politicians have been known to rely on “hunches and
intuitions to gauge complex demographics of caste, religion, community and localities…,”
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 21
6) 15 Indian Big Data companies to watch out for:
1. Heckyl: TechSparks 2011 winner company in financial data analytics space.
Founded by Mukund Mudras, Som Sagar, Abhijit Vedak and Jaison Mathews.
2. Sigmoid Analytics: A TechSparks 2014 company, based out of Bangalore,
Sigmoid is in the area of real-time Big Data warehousing, streaming and ETL
(extract, transform and load) on Apache Spark. They have a technology
infrastructure which companies can use to store their data in a desired format,
perform operations on it and generate insights.
3. Flutura: Mines Big Data to perform analytics and gives hidden insights from
huge chunks of machine generated data for global oil and gas majors to bring in
efficiency and safety. Flutura was founded by Krishnan Raman, Derick Jose and
Srikanth Muralidhara.
4. Indix: Computes real-time data to give product insights for decision makers on
an intuitive dashboard. Founded by Sanjay Parthasarathy, the company has its
product engineering center based out of Chennai.
5. Fractal Analytics: Helps companies in predictive analytics and decision sciences
to understand, predict and shape consumer behavior through advanced analytics,
harmonize data, tell visual stories and forecast business performance.
6. Crayon Data: An algorithms called the WhiteBox, Simpler Choices, takes
massive data, cleans it up and presents only actionable insights to banking,
hospitality and, telecom sectors. It was founded by Srikant Sastri, Suresh,
Shankar and Vijay Kumar.
7. Germin8: It is a leading Data Analytics company that helps brands with social
media measurement and monitoring solutions by analysing conversations in real
time. The Mumbai-based company was founded in 2007 by Raj Nair and his son
Ranjit Nair.
8. Aureus Analytics: With its platform called ASAP (Aureus Statistical and
Analytics Platform) it produces insights by mining enterprise data. Aureus was
founded by technology professionals Anurag Shah, Ashish Tanna and Nitin
Purohit.
9. Dataswft: A product of Bizosys Technologies Pvt Ltd, it has a customized search
engine that can decode technical information and return search queries within
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 22
milliseconds. It was founded by Sunil Guttula, Abinasha Karana and Sridhar
Dhulipala.
10. C360: Corporate360 Pvt Ltd provides IT sales intelligence data services to
enterprises. The startup was founded by college dropout Varun Chandran. Prior
to founding Corporate360, Varun was working as sales and marketing executive
with the likes of SAP, Oracle, Dell and NetApp. C360 is based in India and
Singapore. Another similarly located Big Data company is Antuit holdings,
which raised $56 million from Goldman Sachs and Zodius Capital.
11. Metaome: It is a health care Big Data company focused on life sciences, founded
by Kalpana Krishnaswami and Ramkumar Nandkumar. Metaome’s products
DistilBio a free version web-based graph search and enterprise version platform
that accrues a variety of data from difference sources (laboratory data
management systems, private and public databases) and makes it structured to
help in identifying a pattern.
12. Frrole: It is a Social intelligence startup with a media and brands focused
offering, which allows its customers to integrate real-time Twitter data into their
digital properties and TV shows. The startup was founded by Amarpreet Kalkat,
Nishith Sharma and Abhishek Vaid.
13. Bridgei2i: It focuses on user-centric applications of Big Data. Founders are
Prithvijit Roy, Ashish Sharma, Pritam Kanti Paul.
14. Formcept: Focused on making data analysis accessible to everyone; founders are
Suresh Srinivasan and Anuj Kumar.
15. PromptCloud has been founded by Prashant Kumar. PromptCloud is a DaaS
(Data-as-a-Service) platform; it crawls the websphere for data extraction and has
been founded by Prashant Kumar.
Tools and techniques adopted for Big Data analytics.
Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 23
7) Bibliography
1. "Data, data everywhere". The Economist. 25 February 2010. Retrieved 9
December 2012.
2. Jump up^ "Community cleverness required". Nature 455 (7209): 1. 4
September 2008.
3. Jump up^ "Sandia sees data management challenges spiral".
4. Jump up^ Reichman, O.J.; Jones, M.B.; Schildhauer, M.P. (2011).
"Challenges and Opportunities of Open Data in Ecology".
5. Practical Guide to Controlled Experiments on the Web: Listen to Your
Customers.
6. An introduction to a/b testing for marketing optimization.
7. Machine Learning-A Probabilistic Perspective by Kevin P. Murphy.
8. CDAS: A Crowdsourcing Data Analytics System – paper by NUS.
9. Case study about Shoppes Stop on
http://www.livemint.com/Industry/J5NVBrcewAEM0qF02daqyL/Retail-
sector-gains-big-from-Big-Data.html
10.Case study about Indian election on http://dataconomy.com/narendra-
modi-first-prime-minister-use-big-data-analytics/
11.Case study on AirBnB on http://www.quibb.com/links/analytics-lessons-
learned/view

More Related Content

What's hot

Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET Journal
 
Overview of Data Mining
Overview of Data MiningOverview of Data Mining
Overview of Data Miningijtsrd
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesDr. Amarjeet Singh
 
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...PhD Assistance
 
Data Mining and Knowledge Management
Data Mining and Knowledge ManagementData Mining and Knowledge Management
Data Mining and Knowledge ManagementIRJET Journal
 
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...IJECEIAES
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdfAkuhuruf
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
 
Electronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approachElectronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approachIAEME Publication
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data scienceJohnson Ubah
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data MiningScottperrone
 

What's hot (18)

Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in Software
 
Overview of Data Mining
Overview of Data MiningOverview of Data Mining
Overview of Data Mining
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: Challenges
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining and Knowledge Management
Data Mining and Knowledge ManagementData Mining and Knowledge Management
Data Mining and Knowledge Management
 
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdf
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
Electronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approachElectronics health records and business analytics a cloud based approach
Electronics health records and business analytics a cloud based approach
 
Unit 2
Unit 2Unit 2
Unit 2
 
Data mining
Data mining Data mining
Data mining
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 

Viewers also liked

Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...
Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...
Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...elenae00
 
Логистика для интернет магазинов особенности национальной доставки.
Логистика для интернет магазинов особенности национальной доставки.Логистика для интернет магазинов особенности национальной доставки.
Логистика для интернет магазинов особенности национальной доставки.elenae00
 
Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...
Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...
Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...elenae00
 
Дмитрий Бороздин. Практические советы по использованию CRM в интернет-магазине
Дмитрий Бороздин. Практические советы по использованию CRM в интернет-магазинеДмитрий Бороздин. Практические советы по использованию CRM в интернет-магазине
Дмитрий Бороздин. Практические советы по использованию CRM в интернет-магазинеelenae00
 
Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...
Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...
Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...elenae00
 
J'info sejururi 2014
J'info sejururi 2014J'info sejururi 2014
J'info sejururi 2014J'Info Tours
 
волошин юрий. клиенты на всю жизнь
волошин юрий. клиенты на всю жизньволошин юрий. клиенты на всю жизнь
волошин юрий. клиенты на всю жизньelenae00
 
Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...
Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...
Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...elenae00
 
шклюдов павел. Content discovery рыночные и технологические тренды персоноли...
шклюдов павел. Content discovery рыночные и технологические тренды  персоноли...шклюдов павел. Content discovery рыночные и технологические тренды  персоноли...
шклюдов павел. Content discovery рыночные и технологические тренды персоноли...elenae00
 
11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russia
11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russia11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russia
11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russiaelenae00
 
11 апреля онлайн для оффлайна ирина поддуная обувь россии
11 апреля онлайн для оффлайна ирина поддуная обувь россии11 апреля онлайн для оффлайна ирина поддуная обувь россии
11 апреля онлайн для оффлайна ирина поддуная обувь россииelenae00
 
Современный пользователь Рунета_Евгений Швыряев
 Современный пользователь Рунета_Евгений Швыряев Современный пользователь Рунета_Евгений Швыряев
Современный пользователь Рунета_Евгений Швыряевelenae00
 
10+апреля+постоянные+покупатели+анна+сироткина+baon
10+апреля+постоянные+покупатели+анна+сироткина+baon10+апреля+постоянные+покупатели+анна+сироткина+baon
10+апреля+постоянные+покупатели+анна+сироткина+baonelenae00
 
постоянные покупатели Анна Сироткина
постоянные покупатели Анна Сироткинапостоянные покупатели Анна Сироткина
постоянные покупатели Анна Сироткинаelenae00
 
Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...
Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...
Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...elenae00
 
10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанк
10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанк10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанк
10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанкelenae00
 
Pomodoro Technique on Android Wear
Pomodoro Technique on Android WearPomodoro Technique on Android Wear
Pomodoro Technique on Android WearSho Otani
 
Что выбрать адаптивный дизайн или мобильное приложение
Что выбрать   адаптивный дизайн или мобильное приложениеЧто выбрать   адаптивный дизайн или мобильное приложение
Что выбрать адаптивный дизайн или мобильное приложениеelenae00
 
Наталья Семагина Мобильная интеграция емейлов: путь к успеху.
Наталья Семагина Мобильная интеграция емейлов: путь к успеху.Наталья Семагина Мобильная интеграция емейлов: путь к успеху.
Наталья Семагина Мобильная интеграция емейлов: путь к успеху.elenae00
 

Viewers also liked (20)

Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...
Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...
Мобильное приложение для интернет-магазинов Андрей Вербин, Генеральный директ...
 
Cadbury
CadburyCadbury
Cadbury
 
Логистика для интернет магазинов особенности национальной доставки.
Логистика для интернет магазинов особенности национальной доставки.Логистика для интернет магазинов особенности национальной доставки.
Логистика для интернет магазинов особенности национальной доставки.
 
Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...
Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...
Структура онлайн-магазина: от маленькой компании к лидеру рынка Максим Хивинц...
 
Дмитрий Бороздин. Практические советы по использованию CRM в интернет-магазине
Дмитрий Бороздин. Практические советы по использованию CRM в интернет-магазинеДмитрий Бороздин. Практические советы по использованию CRM в интернет-магазине
Дмитрий Бороздин. Практические советы по использованию CRM в интернет-магазине
 
Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...
Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...
Переход от стратегии выживания к стратегии устойчивого развития: ориентация н...
 
J'info sejururi 2014
J'info sejururi 2014J'info sejururi 2014
J'info sejururi 2014
 
волошин юрий. клиенты на всю жизнь
волошин юрий. клиенты на всю жизньволошин юрий. клиенты на всю жизнь
волошин юрий. клиенты на всю жизнь
 
Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...
Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...
Эффективные инструменты для управление доставкой заказов интернет-магазинов. ...
 
шклюдов павел. Content discovery рыночные и технологические тренды персоноли...
шклюдов павел. Content discovery рыночные и технологические тренды  персоноли...шклюдов павел. Content discovery рыночные и технологические тренды  персоноли...
шклюдов павел. Content discovery рыночные и технологические тренды персоноли...
 
11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russia
11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russia11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russia
11+апреля+онлайн+для+оффлайна+владимир+малюгин+pay pal+russia
 
11 апреля онлайн для оффлайна ирина поддуная обувь россии
11 апреля онлайн для оффлайна ирина поддуная обувь россии11 апреля онлайн для оффлайна ирина поддуная обувь россии
11 апреля онлайн для оффлайна ирина поддуная обувь россии
 
Современный пользователь Рунета_Евгений Швыряев
 Современный пользователь Рунета_Евгений Швыряев Современный пользователь Рунета_Евгений Швыряев
Современный пользователь Рунета_Евгений Швыряев
 
10+апреля+постоянные+покупатели+анна+сироткина+baon
10+апреля+постоянные+покупатели+анна+сироткина+baon10+апреля+постоянные+покупатели+анна+сироткина+baon
10+апреля+постоянные+покупатели+анна+сироткина+baon
 
постоянные покупатели Анна Сироткина
постоянные покупатели Анна Сироткинапостоянные покупатели Анна Сироткина
постоянные покупатели Анна Сироткина
 
Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...
Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...
Анализ методов доставки (курьер, самовывоз, почта). Возвраты и методы снижени...
 
10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанк
10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанк10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанк
10+апреля+подходы+к+ит инновациям+мирча+михаэску+сбербанк
 
Pomodoro Technique on Android Wear
Pomodoro Technique on Android WearPomodoro Technique on Android Wear
Pomodoro Technique on Android Wear
 
Что выбрать адаптивный дизайн или мобильное приложение
Что выбрать   адаптивный дизайн или мобильное приложениеЧто выбрать   адаптивный дизайн или мобильное приложение
Что выбрать адаптивный дизайн или мобильное приложение
 
Наталья Семагина Мобильная интеграция емейлов: путь к успеху.
Наталья Семагина Мобильная интеграция емейлов: путь к успеху.Наталья Семагина Мобильная интеграция емейлов: путь к успеху.
Наталья Семагина Мобильная интеграция емейлов: путь к успеху.
 

Similar to Big Data Analytics

A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSijistjournal
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...IRJET Journal
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)NikitaRajbhoj
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal1
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overviewieijjournal
 
Big data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docxBig data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docxhartrobert670
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 
A Deep Dissertion Of Data Science Related Issues And Its Applications
A Deep Dissertion Of Data Science  Related Issues And Its ApplicationsA Deep Dissertion Of Data Science  Related Issues And Its Applications
A Deep Dissertion Of Data Science Related Issues And Its ApplicationsTracy Hill
 
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Piyush Malik
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET Journal
 

Similar to Big Data Analytics (20)

A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICS
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overview
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Big data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docxBig data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docx
 
Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
A Deep Dissertion Of Data Science Related Issues And Its Applications
A Deep Dissertion Of Data Science  Related Issues And Its ApplicationsA Deep Dissertion Of Data Science  Related Issues And Its Applications
A Deep Dissertion Of Data Science Related Issues And Its Applications
 
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Big Data Analytics

  • 1. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 1 INDEX - OVERVIEW - INTRODUCTION - BIG DATA ANALYSIS PIPELINE -Data Acquisition and Recording -Information Extraction and Cleaning -Data Integration, Aggregation, and Representation -Query Processing, Data Modelling, and Analysis -Interpretation - FIELDS OF RELEVANCE - TOOLS AND TECHNIQUES IN DTA ANALYTICS AN OVERVIEW - a / b testing - Crowdsourcing - Machine learning - CASE STUDIES - Shoppers’ Stop - Air BnB - Indian Elections - 15 Upcoming BIG DATA startups
  • 2. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 2 Overview “Information is the oil and data analytics is the combustion engine”- Peter Sondergaard Sine the invention of the World Wide Web information has been mace accessible at a minute level, this has further created a lot of unstructured data. Technology and mathematical pioneers predicted the boom in information even the modern times. The scenarios that led to such a conclusion included the ever expanding information in the field of science, maintenance of census details, huge amount of journals and publications that had to be stored seemed to be a tedious task. The rate at which information in expanding today thanks to the connectivity of the society by the internet and mobile phones. This report is a glance into the tools and techniques adopted by organizations today in achieving the optimized solution for various problems, functions to be incorporated into products, making a value added proposition for each decision taken. The initial effort was to understand the functional aspects of big data, the report then ventures into the upcoming tools as an overall and further focuses on three major concepts among them. The study then analyses three cases of different magnitude, scope and location to understand the application level of big data at a fundamental and practical level. The study has focussed on two Indian cases and an international case. The study also aims at understanding upcoming trends in the analytics environment by providing top 15 startups solely based on Data Sciences.
  • 3. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 3 2) Introduction: The term “Big Data,” which spans computer science and statistics/econometrics, probably originated in lunch-table conversations at Silicon Graphics Inc. (SGI) in the mid 1990s, in which John Mashey figured prominently. The first significant academic references are arguably Weiss and Indurkhya (1998) in computer science and Diebold (2000) in statistics/econometrics. An unpublished 2001 research note by Douglas Laney at Gartner enriched the concept significantly. Hence the term “Big Data” appears reasonably attributed to Massey, Weiss and Indurkhya, Diebold, and Laney. Big Data the phenomenon continues unabated, and Big Data the discipline is emerging. Recent technological advances and novel applications, such as sensors, cyber-physical systems, smart mobile devices, cloud systems, data analytics, and social networks, are making possible to capture, process, and share huge amounts of data – referred to as big data - and to extract useful knowledge, such as patterns, from this data and predict trends and events. Big data is making possible tasks that before were impossible, like preventing disease spreading and crime, personalizing healthcare, quickly identifying business opportunities 2.1) Definition: Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from large datasets that are diverse, complex, and of a massive scale.
  • 4. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 4 Big data can be described by the following characteristics: Volume – The quantity of data that is generated is very important in this context. It is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered as Big Data or not. The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic. Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data. Velocity - The term ‘velocity’ in the context refers to the speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development. Variability - This is a factor which can be a problem for those who analyze the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. Complexity - Data management can become a very complex process, especially when large volumes of data come from multiple sources. These data need to be linked, connected and correlated in order to be able to grasp the
  • 5. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 5 3) Big Data analysis pipeline: 3.1) Phases in the Processing Pipeline: 3.1.1) Data Acquisition and Recording Big Data does not arise out of a vacuum: it is recorded from some data generating source. For example, consider our ability to sense and observe the world around us, from the heart rate of an elderly citizen, and presence of toxins in the air we breathe, to the planned square kilometre array telescope, which will produce up to 1 million terabytes of raw data per day. Similarly, scientific experiments and simulations can easily produce petabytes of data today. Much of this data is of no interest, and it can be filtered and compressed by orders of magnitude. One challenge is to define these filters in such a way that they do not discard useful information. We need research in the science of data reduction that can intelligently process this raw data to a size that its users can handle while not missing the needle in the haystack.
  • 6. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 6 Furthermore, we require “on-line” analysis techniques that can process such streaming data on the fly, since we cannot afford to store first and reduce afterward. The second big challenge is to automatically generate the right metadata to describe what data is recorded and how it is recorded and measured. For example, in scientific experiments, considerable detail regarding specific experimental conditions and procedures. 3.1.2) Information Extraction and Cleaning Frequently, the information collected will not be in a format ready for analysis. For example, consider the collection of electronic health records in a hospital, comprising transcribed dictations from several physicians, structured data from sensors and measurements (possibly with some associated uncertainty), and image data such as x-rays. We cannot leave the data in this form and still effectively analyse it. Rather we require an information extraction process that pulls out the required information from the underlying sources and expresses it in a structured form suitable for analysis. 3.1.3) Data Integration, Aggregation, and Representation Data can be very heterogeneous and may have different metadata. Data integration, even in more conventional cases, requires huge human efforts. Novel approaches that can improve the automation of data integration are critical as manual approaches will not scale to what is required for big data. Also different data aggregation and representation strategies may be needed for different data analysis tasks Even for simpler analyses that depend on only one data set, there remains an important question of suitable database design. Usually, there will be many alternative ways in which to store the same information. Certain designs will have advantages over others for certain purposes, and possibly drawbacks for other purposes. 3.1.4) Query Processing, Data Modelling, and Analysis Methods suitable for big data need to be able to deal with noisy, dynamic, heterogeneous, untrustworthy data and data characterized by complex relations. However despite these difficulties, big data even if noisy and uncertain can be more valuable for identifying more reliable hidden patterns and knowledge compared to tiny samples of good data. Also the (often redundant) relationships existing among data can represent an opportunity
  • 7. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 7 for cross-checking data and thus improve data trustworthiness. Supporting query processing and data analysis requires scalable mining algorithms and powerful computing infrastructures. A problem with current Big Data analysis is the lack of coordination between database systems, which host the data and provide SQL querying, with analytics packages that perform various forms of non-SQL processing, such as data mining and statistical analyses. Today’s analysts are impeded by a tedious process of exporting data from the database, performing a non-SQL process and bringing the data back. 3.1.5) Interpretation Having the ability to analyze Big Data is of limited value if users cannot understand the analysis. Ultimately, a decision-maker, provided with the result of analysis, has to interpret these results. This is rarely enough to provide just the results. Rather, one must provide supplementary information that explains how each result was derived, and based upon precisely what inputs. Such supplementary information is called the provenance of the (result) data. Visualizations become important in conveying to the users the results of the queries in a way that is best understood in the particular domain. Whereas early business intelligence systems’ users were content with tabular presentations, today’s analysts need to pack and present results in powerful visualizations that assist interpretation, and support user collaboration. Furthermore, with a few clicks the user should be able to drill down into each piece of data that she sees and understand its provenance, which is a key feature to understanding the data. That is, users need to be able to see not just the results, but also understand why they are seeing those results.
  • 8. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 8 4) Fields of relevance Big data is relevant for all components of our society. Industry is using big data for shifting business intelligence from reporting and decision support to prediction and next- move decisions. This use of big data emphasizes that big data is critical for obtaining actionable knowledge. Governments are also interested in using big data and predictive analytics to improve decision making and transparency, to engage citizens in public affairs, to improve national security. Healthcare represents another major area to which big data may offer novel opportunities. Learning health systems are currently focusing on turning health care data into knowledge, translating that knowledge into practice, and creating new data by means of advanced information technology. As pointed out in, the use of big data technologies can reduce the cost of healthcare while improving its quality by making care more preventive and personalized and basing it on more extensive (home-based) continuous monitoring. Big data is also crucial for research. Many areas of science and engineering are currently facing from a hundred to a thousand-fold increase in the volume of data generated compared to only one decade ago. This data is produced by many sources including simulations, high-throughput scientific instruments, satellites, and telescopes. While the availability of big data is revolutionizing how research is conducted and is leading to the emergence of a new paradigm of science based on data-intensive computing, at the same time it poses a significant challenge for scientists. In order to be able to leverage these huge volumes of data, new techniques and technologies are needed. A new type of e-infrastructure, the Research Data Infrastructure, must be designed, implemented and optimized to support the full life cycle of scientific data, its movement across scientific disciplines, and its integration with published literature.
  • 9. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 9 Fig 2. Infographic showing current and developing state of big data.
  • 10. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 10 5) Tools and techniques an overview The main focus on this report is to study the various tools, techniques and technologies that have been adopted by organizations around the world to reduce this monumental size of data from its unstructured state to a quantifiable, structured and retrievable state. The techniques provide a fundamental insight into the basics of retrieving useful data from big databases or big data as its referred. Tools and Techniques  A/B testing  Crowdsourcing  Data fusion and integration  Genetic algorithms  Machine learning  Natural language processing  Signal processing, simulation  Time series analysis  Visualization  Data mining  Association rule learning  Classification tree analysis  Regression analysis  Sentiment analysis  Social network analysis
  • 11. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 11 4.1) a/b testing It is a form of statistical hypothesis testing with two variants leading to the technical term, Two-sample hypothesis testing, used in the field of statistics. Other terms used for this method include bucket tests and split testing but these terms have a wider applicability to more than two variants. A/B testing, also known as split testing, is a method of testing through which marketing variables are compared to each other to identify the one that brings a better response rate. In this context, the element that is being testing is called “control” and the element that is argued to give a better result is called “treatment.” Running A/B tests in your marketing initiatives is a great way to learn how to drive more traffic to your website and generate more leads from the visits you’re getting. Just a few small tweaks to a landing page, email or call- to-action can significantly affect the number of leads your company attracts. The insights stemming from split tests can drastically improve the conversion rates of your landing pages and the clickthrough rates of your website calls-to-action and email campaigns. In fact, A/B testing of landing pages can generate up to 30-40% more leads for B2B sites and 20-25% more leads for eCommerce sites. Fig 3. Analysis of variation in A/B testing
  • 12. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 12 The statistical aspects behind a/b testing Factor - A controllable experimental variable that is thought to influence the OEC. Factors are assigned Values, sometimes called Levels or Versions. Factors are sometimes called Variables. In simple A/B tests, there is a single factor with two values: A and B. Variant - A user experience being tested by assigning levels to the factors; it is either the Control or one of the Treatments. Sometimes referred to as Treatment, although we prefer to specifically differentiate between the Control, which is a special variant that designates the existing version being compared against and the new Treatments being tried. In case of a bug, for example, the experiment is aborted and all users should see the Control variant. Experimentation Unit - The entity on which observations are made. Sometimes called an item. The units are assumed to be independent. On the web, the user is the most common experimentation unit, although some experiments may be done on sessions or page views. For the rest of the paper, we will assume that the experimentation unit is a user. It is important that the user receive a consistent experience throughout the experiment, and this is commonly achieved through cookies. Null Hypothesis - The hypothesis, often referred to as H0, that the OECs(overll evaluation criteria) for the variants are not different and that any observed differences during the experiment are due to random fluctuations. Confidence level - The probability of failing to reject (i.e., retaining) the null hypothesis when it is true. Power. The probability of correctly rejecting the null hypothesis, H0 , when it is false. Power measures our ability to detect a difference when it indeed exists. A/A Test - Sometimes called a Null Test . Instead of an A/B test, you exercise the experimentation system, assigning users to one of two groups, but expose them to exactly the same experience. An A/A test can be used to (i) collect data and assess its variability for power calculations, and (ii) test the experimentation system (the Null hypothesis should be rejected about 5% of the time when a 95% confidence level is used). Standard Deviation (Std-Dev) - A measure of variability, typically denoted by 𝜎. Standard Error (Std-Err) - For a statistic, it is the standard deviation of the sampling distribution of the sample statistic . For a mean of 𝑛 independent observations, it is 𝜎 / 𝑛 where 𝜎 is the estimated standard deviation.
  • 13. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 13 On an e-commerce website you can test:  Content: headlines, texts, product descriptions, testimonials, etc. I strongly believe that words can make a huge difference for any business. Communicating the right message, in the right way, to the right audience will boost conversions instantly.  Images and video. Sometimes, people tend to skip the lines and simply look at the pictures on your website. Always display high quality pictures, related to the topics on site.  Call-to-action buttons.  Design. I include here: fonts, colours, position of elements on page, etc. The whole website must have one design to match the brand’s identity. Use matching colours and always check the meaning of every colour because it has a huge impact on visitors. Benefits of a/b testing AB Testing comes in handy because it lowers risks when it comes to important decisions in the company. Doing AB Testing constantly, will point out what to do and not to do on your website, and you will know what decision to make. With AB Testing, failure is not an option. I say this because you have nothing to lose in an AB Testing experiment. Even if the test hasn’t reached a statistical relevance or if the results are not how you expected, there’s no financial loss involved. It is cheaper to use AB testing that to directly modify your website. In case you decide to modify your website, without testing it first, you invest lots of money and time in programming and design. And nothing could tell you if the money you spend will get back to you as profit. But, if you test the variations and you realize it’s not worthy to make those changes, you save time and money. Some online tools that help in achieving a/b testing are Google analytics content experiment, Optimizely, Unbounce, Wingify, Genetify, Five second test etc.
  • 14. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 14 4.2) Crowdsourcing Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. This can take the form of peer-production (when the job is performed collaboratively), but is also often undertaken by sole individuals. The crucial prerequisite is the use of the open call format and the large network of potential labourers. Fig 4. Systematic data flow in crowdsourcing analytics From avoiding traffic jams, to analysing pedestrian flow patterns, to finding the best public toilet in town, crowdsourcing apps are showing that many smartphones make for light work. With thousands of mini-reports coming in from around the internet, a mosaic of information can form a larger picture that can be used for many different purposes, from meteorology to car-sharing. Using the intelligence of a vast interconnected organism, however, is nothing new: the venerable Oxford English Dictionary may in fact be the earliest example of crowdsourcing. In the mid-19th century it made an open call for volunteers to log words and provide examples of their usage. Over a 70-year period, it received more than six million submissions. Today, crowdsourcing is used in investing, in creative work and in funding start up projects.
  • 15. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 15 Crowdsourcing sites like represent just one more type of network that will connect people with products and technology, telling you what products they used, what they thought of them, and what reviews they read, liked or shared. If you then link the crowdsourcing network to a business network like LinkedIn, you can connect companies to reviewers and bring with it lots of context that when comprehensively analysed can transform your understanding of the reviews:  Analyze the reviews for opinion. What companies are using what products, what they think of them, and why?  Analyze the interactions for need and intent. If someone read lots of reviews about CRM systems, there is a reasonable chance they may need a CRM system. If they then share or recommend a particular review, that may indicate intent to buy or intent to investigate further. It may also indicate an attempt to sell, however this is easy to catch from the business network.  Analyze the business network for context. Company name, maybe industry, products, location, social network (which itself could be analyzable for further context – what they are talking about, who they are associated with, etc.) All this analysis lays down more data about the data, and allows you to model the products in a whole new way inferring new characteristics from the organizations providing the feedback. You can then provide analysis that is personalized around common attributes between organizations and provide a much deeper drill-down based on a broader set of harvested features. This type of analysis goes way beyond what a traditional analyst can possibly achieve, although it clearly brings with it the risk of introducing noise. So, while the analyst is not going anywhere anytime soon (and the great ones will always be in demand), these new approaches of gathering and analyzing data will challenge the status quo. To what outcome, only time will tell. Some common crowdsource platforms are Quora, Frilp, Indiegogo,Kickstarter (crowdfunding) etc
  • 16. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 16 4.3) Machine learning Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions. Machine learning is closely related to and often overlaps with computational statistics; a discipline which also specializes in prediction-making. Machine learning is a subfield of computer science stemming from research into artificial intelligence. It has strong ties to statistics and mathematical optimization, which deliver methods, theory and application domains to the field. Machine learning is employed in a range of computing tasks where designing and programming explicit, rule-based algorithms infeasible. Example applications include spam filtering, optical character recognition(OCR), search engines and computer vision. Machine learning is sometimes conflated with data mining, although that focuses more on exploratory data analysis. Machine learning and pattern recognition "can be viewed as two facets of the same field." When employed in industrial contexts, machine learning methods may be referred to as predictive analytics or predictive modelling. Types of machine learning Machine learning is usually divided into two main types. In the predictive or supervised learning approach, the goal is to learn a mapping from inputs x to outputs y, given a labeled set of input-output pairs D = {(xi, yi)} N i=1. Here D is called the training set, and N is the number of training examples. In the simplest setting, each training input xi is a D-dimensional vector of numbers, representing, say, the height and weight of a person. These are called features, attributes or covariates. In general, however, xi could be a complex structured object, such as an image, a sentence, an email message, a time series, a molecular shape, a graph, etc. Similarly the form of the output or response variable can in principle be anything, but most methods assume that yi is a categorical or nominal variable from some finite set, yi ∈ {1,...,C} (such as male or female), or that yi is a real-valued scalar (such as income level). When yi is categorical, the problem is known as classification or pattern recognition, and when yi is real-valued, the problem is known as regression.
  • 17. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 17 The second main type of machine learning is the descriptive or unsupervised learning approach. Here we are only given inputs, D = {xi}N i=1, and the goal is to find “interesting patterns” in the data. This is sometimes called knowledge discovery. This is a much less well- defined problem, since we are not told what kinds of patterns to look for, and there is no obvious error metric to use (unlike supervised learning, where we can compare our prediction of y for a given x to the observed value). Fig 4. Comparison of 2 types of machine learning.
  • 18. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 18 5) CASE STUDIES 5.1)Case study on Indian retail giant Shoppers Stop Three years ago when Shoppers Stop Ltd started its Big Data analytics programme, little did it know that Big Data would lead to big gains .In one of its earliest analytics programmes, the company studied the buying patterns of members of its loyalty programme, First Citizen.Based on the insights, it developed targeted promotions for trousers .This led to around 10 crore worth of additional sales in a three week period for Shoppers Stop. After analysing its First Citizen base, the company had observed that not all those who buy shirts also buy trousers. But those who buy both men’s shirts and trousers spend 60% more a year on average than those who buy only shirts, and thrice as much as those who don’t buy men’s shirts at all, said Vinay Bhatia, vicepresident, marketing and loyalty, Shoppers Stop. It then shortlisted over 900,000 people for a “targeted trouser communication”. According to Bhatia, the 900,000 were further divided into three groups of target customers. The first group included customers who showed a pattern of being interested in new brands in other non trouser categories. They were sent information on new trouser brand launches and fits. The second group included those who exhibited multiple buying patterns in other categories. They were sent attractive deals if they bought two or more trousers. Finally, the third was a “control group” to measure success or failure of the promotions. “This (control group) is a practice that we do for all our analytics insights,” added Bhatia, “the targeted communication exercise led to a lift of 30% in sales (about `10 crore) when compared with the response received from the control group. Big Data analytics is now a crucial part of the company’s strategy.
  • 19. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 19 5.2) Case study on Airbnb Airbnb is an incredible success story. In just a few years, the company has become a powerhouse in the travel industry, providing travelers with an alternative to hotels, and providing individuals who have rooms, apartments or homes to rent with a new source of income. In 2012, travelers booked over 5 million nights with Airbnb’s service. But it started small, and its founders—adherents to the Lean Startup mindset—took a very methodical approach to their success. Joe Zadeh, Product Lead at Airbnb, shared part of the company’s amazing story. He focused on one aspect of their business: professional photography. It started with a hypothesis: “Hosts with professional photography will get more business. And hosts will sign up for professional photography as a service.” This is where the founders’ gut instincts came in: they had a sense that professional photography would help their business. But rather than implementing it outright, they built a Concierge Minimum Viable Product (MVP) to quickly test their hypothesis. Initial tests of their MVP showed that professionally photographed listings got two to three times more bookings than the market average. This validated their first hypothesis. And it turned out that hosts were wildly enthusiastic to receive an offer from Airbnb to take those photographs for them. In mid-to-late 2011, Airbnb had 20 photographers in the field taking pictures for hosts—roughly the same time period where we see the proverbial “hockey stick” of growth in terms of nights booked Summary: • Airbnb’s team had a hunch that better photos would increase rentals. • They tested the idea with a Concierge MVP, putting the least effort possible into a test that would give them valid results. • When the experiment showed good results, they built the necessary components and rolled it out to all customers. Analytics Lessons Learned: Sometimes, growth comes from an aspect of your business you don’t expect. When you think you’ve found a worthwhile idea, decide how to test it quickly, with minimal investment. Define what success looks like beforehand, and know what you’re going to do if your hunch is right..
  • 20. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 20 5.3) Case study on Indian Elections – The first Prime Minister to use BIG DATA Modi’s use of big data so impressive is that it was both relatively new to Indian politics, and wrought with unique challenges. Take, for example, the size of the Indian electorate. With 814 million voters, in comparison to the USA’s 193.6 million and the UK’s 45.5 million, the sheer volume of data of India’s voting population was perhaps the largest obstacle. The second was the variety of data – India’s voter rolls in 12 different languages and 900,000 PDF’s amounting to 25 million pages made for a heterogeneous, non-uniform and deeply diverse information set. Finally, the veracity of the information was often questionable – one report noted that some voters were listed as 19,545 years old, and others a confounding 0 years old. Name overlapping (there are 327,000 women named “Sita” in Bihar alone) only further complicated the process. Despite these challenges, the rewards – as Modi has clearly demonstrated while employing this data to “drive donations, enroll volunteers, and improve the effectiveness of everything from door knocks…to social media” – are significant. BJP’s website, for example, planted cookies on all computers that visited its site, and then used information about these users’ further internet activity – i.e., the sites they visited after BJP’s – for customised advertisements: “If you move out of the BJP website and visit a website for bikes followed by a search on jobs, the algorithm will make the inference that you are a young male from a particular constituency, say Delhi, who is currently on a job hunt. What happens next is when you visit a job searching portal like Naukri.com, this system pops up a contextual ad for you like ‘jobs in Delhi’. The BJP banner which is just below the results will tell you ‘There are no Jobs in Delhi. India deserves better’.” – source Tactics like these — both online and offline analytics and marketing — were the backbone to Modi’s success. He lead the charge with both social media and the analysis of publicly available data. Whereas Indian politicians have been known to rely on “hunches and intuitions to gauge complex demographics of caste, religion, community and localities…,”
  • 21. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 21 6) 15 Indian Big Data companies to watch out for: 1. Heckyl: TechSparks 2011 winner company in financial data analytics space. Founded by Mukund Mudras, Som Sagar, Abhijit Vedak and Jaison Mathews. 2. Sigmoid Analytics: A TechSparks 2014 company, based out of Bangalore, Sigmoid is in the area of real-time Big Data warehousing, streaming and ETL (extract, transform and load) on Apache Spark. They have a technology infrastructure which companies can use to store their data in a desired format, perform operations on it and generate insights. 3. Flutura: Mines Big Data to perform analytics and gives hidden insights from huge chunks of machine generated data for global oil and gas majors to bring in efficiency and safety. Flutura was founded by Krishnan Raman, Derick Jose and Srikanth Muralidhara. 4. Indix: Computes real-time data to give product insights for decision makers on an intuitive dashboard. Founded by Sanjay Parthasarathy, the company has its product engineering center based out of Chennai. 5. Fractal Analytics: Helps companies in predictive analytics and decision sciences to understand, predict and shape consumer behavior through advanced analytics, harmonize data, tell visual stories and forecast business performance. 6. Crayon Data: An algorithms called the WhiteBox, Simpler Choices, takes massive data, cleans it up and presents only actionable insights to banking, hospitality and, telecom sectors. It was founded by Srikant Sastri, Suresh, Shankar and Vijay Kumar. 7. Germin8: It is a leading Data Analytics company that helps brands with social media measurement and monitoring solutions by analysing conversations in real time. The Mumbai-based company was founded in 2007 by Raj Nair and his son Ranjit Nair. 8. Aureus Analytics: With its platform called ASAP (Aureus Statistical and Analytics Platform) it produces insights by mining enterprise data. Aureus was founded by technology professionals Anurag Shah, Ashish Tanna and Nitin Purohit. 9. Dataswft: A product of Bizosys Technologies Pvt Ltd, it has a customized search engine that can decode technical information and return search queries within
  • 22. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 22 milliseconds. It was founded by Sunil Guttula, Abinasha Karana and Sridhar Dhulipala. 10. C360: Corporate360 Pvt Ltd provides IT sales intelligence data services to enterprises. The startup was founded by college dropout Varun Chandran. Prior to founding Corporate360, Varun was working as sales and marketing executive with the likes of SAP, Oracle, Dell and NetApp. C360 is based in India and Singapore. Another similarly located Big Data company is Antuit holdings, which raised $56 million from Goldman Sachs and Zodius Capital. 11. Metaome: It is a health care Big Data company focused on life sciences, founded by Kalpana Krishnaswami and Ramkumar Nandkumar. Metaome’s products DistilBio a free version web-based graph search and enterprise version platform that accrues a variety of data from difference sources (laboratory data management systems, private and public databases) and makes it structured to help in identifying a pattern. 12. Frrole: It is a Social intelligence startup with a media and brands focused offering, which allows its customers to integrate real-time Twitter data into their digital properties and TV shows. The startup was founded by Amarpreet Kalkat, Nishith Sharma and Abhishek Vaid. 13. Bridgei2i: It focuses on user-centric applications of Big Data. Founders are Prithvijit Roy, Ashish Sharma, Pritam Kanti Paul. 14. Formcept: Focused on making data analysis accessible to everyone; founders are Suresh Srinivasan and Anuj Kumar. 15. PromptCloud has been founded by Prashant Kumar. PromptCloud is a DaaS (Data-as-a-Service) platform; it crawls the websphere for data extraction and has been founded by Prashant Kumar.
  • 23. Tools and techniques adopted for Big Data analytics. Dept.of Industrial EngineeringandManagement,Bangalore Instituteof Technology| 23 7) Bibliography 1. "Data, data everywhere". The Economist. 25 February 2010. Retrieved 9 December 2012. 2. Jump up^ "Community cleverness required". Nature 455 (7209): 1. 4 September 2008. 3. Jump up^ "Sandia sees data management challenges spiral". 4. Jump up^ Reichman, O.J.; Jones, M.B.; Schildhauer, M.P. (2011). "Challenges and Opportunities of Open Data in Ecology". 5. Practical Guide to Controlled Experiments on the Web: Listen to Your Customers. 6. An introduction to a/b testing for marketing optimization. 7. Machine Learning-A Probabilistic Perspective by Kevin P. Murphy. 8. CDAS: A Crowdsourcing Data Analytics System – paper by NUS. 9. Case study about Shoppes Stop on http://www.livemint.com/Industry/J5NVBrcewAEM0qF02daqyL/Retail- sector-gains-big-from-Big-Data.html 10.Case study about Indian election on http://dataconomy.com/narendra- modi-first-prime-minister-use-big-data-analytics/ 11.Case study on AirBnB on http://www.quibb.com/links/analytics-lessons- learned/view