Hadoop is an open-source software framework used for storing and processing large datasets in a distributed manner across commodity hardware. It was created in 2005 by Doug Cutting and Mike Cafarella to address the issue of processing big data at a reasonable cost and time. Hadoop uses HDFS for storage and MapReduce for processing data distributed over a network of nodes in parallel. It allows organizations to gain insights from vast amounts of structured and unstructured data faster and at lower costs than traditional approaches.
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Simplilearn
In this Big Data presentation, we will be discussing the Big data growth over the last few years followed by the various big data applications. We will look into the various sectors where big data is used such as weather forecast, healthcare, media and entertainment, logistics, travel & tourism and finally in the government & law enforcement sector.
We will be discussing how below industries are using Big Data presentation:
1. Weather forecast
2. Media and entertainment
3. Healthcare
4. Logistics
5. Travel n tourism
6. Government and law enforcement
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
This presentation is an Introduction to the importance of Data Analytics in Product Management. During this talk Etugo Nwokah, former Chief Product Officer for WellMatch, covered how to define Data Analytics why it should be a first class citizen in any software organization
Understanding big data and data analytics big dataSeta Wicaksana
Big Data helps companies to generate valuable insights. Companies use Big Data to refine their marketing campaigns and techniques. Companies use it in machine learning projects to train machines, predictive modeling, and other advanced analytics applications.
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Simplilearn
In this Big Data presentation, we will be discussing the Big data growth over the last few years followed by the various big data applications. We will look into the various sectors where big data is used such as weather forecast, healthcare, media and entertainment, logistics, travel & tourism and finally in the government & law enforcement sector.
We will be discussing how below industries are using Big Data presentation:
1. Weather forecast
2. Media and entertainment
3. Healthcare
4. Logistics
5. Travel n tourism
6. Government and law enforcement
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
This presentation is an Introduction to the importance of Data Analytics in Product Management. During this talk Etugo Nwokah, former Chief Product Officer for WellMatch, covered how to define Data Analytics why it should be a first class citizen in any software organization
Understanding big data and data analytics big dataSeta Wicaksana
Big Data helps companies to generate valuable insights. Companies use Big Data to refine their marketing campaigns and techniques. Companies use it in machine learning projects to train machines, predictive modeling, and other advanced analytics applications.
Big Data Analytics: Recent Achievements and New ChallengesEditor IJCATR
The era of Big data is being generated by everything around us at all times. Every digital process and social media
exchange produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming
velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics
capabilities and skills. Big data has become an important issue for a large number of research areas such as data mining,
machine learning, computational intelligence, information fusion, the semantic Web, and social networks. The combination of
big data technologies and traditional machine learning algorithms has generated new and interesting challenges in other areas
as social media and social networks. These new challenges are focused mainly on problems such as data processing, data
storage, data representation, and how data can be used for pattern mining, analysing user behaviours, and visualizing and
tracking data, among others. In this paper, discussion about the new concept big data and data analytic their concept, tools
and methodologies that is designed to allow for efficient data mining and information sharing fusion from social media and of
the new applications and frameworks that are currently appearing under the “umbrella” of the social networks, social media
and big data paradigms.
Societal Impact of Applied Data Science on the Big Data StackStealth Project
Data availability should ideally improve accountability and decision processes. Armed with evidence of data science working across multiple domains from healthcare analytics to internet advertising big data is enabling changes in society, one application at a time. This talk will have two parts. We will first present a data scientist's overview of different technologies in use today and their utility.
Then we will do a deep-dive on specific implementation and challenges we addressed while working with multiple partners in the healthcare industry on real-world healthcare data. We will discuss and demonstrate prototypes of our solutions for cost prediction and risk-of-readmission care management, and how we leveraged big data machine learning frameworks. We will end with an open conversation about challenges in verticals other than healthcare and provide an overview of ongoing efforts for social good at the University of Washington Center for Data Science; each a story in its own.
The objective of this module is to provide an overview of the basic information on big data.
Upon completion of this module you will:
-Comprehend the emerging role of big data
-Understand the key terms regarding big and smart data
-Know how big data can be turned into smart data
-Be able to apply the key terms regarding big data
Using big data to turn shoppers into big spenders
Applying the range, volume and velocity of retail data
Advanced modelling, forecasting and segmentation of data
Cultivating and retaining loyal customers using internal and external data
Optimising analytics in merchandising and demand forecasting Increasing store level profitability and competitiveness
Using analytics to detect retail fraud
Big data is a term that describes a large or complex
data volume. That data volume can be processes using traditional
data processing software or techniques that are insufficient to deal
with them. But big data is often noisy, heterogeneous, irrelevant
and untrustworthy. As the speed of information growth exceeds
Moore’s Law at the beginning of this new century, excessive data
is making great troubles to human beings. However this data with
special attributes can’t be managed and processed by the current
traditional software system, which become a real problem. In this
paper was discussed some big data challenges and problems that
are faced by organizations. These challenges may relate
heterogeneity, scale, timelines, privacy and human collaboration.
Survey method was used as a theoretical solution framework.
Survey method consists of a questionnaires report. Questionnaires
report consists of all challenges and problems faced by
organizations. After knowing the problem and challenges of
organizations, a solution was given to organization to solve big
data challenges.
Big Data Analytics: Recent Achievements and New ChallengesEditor IJCATR
The era of Big data is being generated by everything around us at all times. Every digital process and social media
exchange produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming
velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics
capabilities and skills. Big data has become an important issue for a large number of research areas such as data mining,
machine learning, computational intelligence, information fusion, the semantic Web, and social networks. The combination of
big data technologies and traditional machine learning algorithms has generated new and interesting challenges in other areas
as social media and social networks. These new challenges are focused mainly on problems such as data processing, data
storage, data representation, and how data can be used for pattern mining, analysing user behaviours, and visualizing and
tracking data, among others. In this paper, discussion about the new concept big data and data analytic their concept, tools
and methodologies that is designed to allow for efficient data mining and information sharing fusion from social media and of
the new applications and frameworks that are currently appearing under the “umbrella” of the social networks, social media
and big data paradigms.
Societal Impact of Applied Data Science on the Big Data StackStealth Project
Data availability should ideally improve accountability and decision processes. Armed with evidence of data science working across multiple domains from healthcare analytics to internet advertising big data is enabling changes in society, one application at a time. This talk will have two parts. We will first present a data scientist's overview of different technologies in use today and their utility.
Then we will do a deep-dive on specific implementation and challenges we addressed while working with multiple partners in the healthcare industry on real-world healthcare data. We will discuss and demonstrate prototypes of our solutions for cost prediction and risk-of-readmission care management, and how we leveraged big data machine learning frameworks. We will end with an open conversation about challenges in verticals other than healthcare and provide an overview of ongoing efforts for social good at the University of Washington Center for Data Science; each a story in its own.
The objective of this module is to provide an overview of the basic information on big data.
Upon completion of this module you will:
-Comprehend the emerging role of big data
-Understand the key terms regarding big and smart data
-Know how big data can be turned into smart data
-Be able to apply the key terms regarding big data
Using big data to turn shoppers into big spenders
Applying the range, volume and velocity of retail data
Advanced modelling, forecasting and segmentation of data
Cultivating and retaining loyal customers using internal and external data
Optimising analytics in merchandising and demand forecasting Increasing store level profitability and competitiveness
Using analytics to detect retail fraud
Big data is a term that describes a large or complex
data volume. That data volume can be processes using traditional
data processing software or techniques that are insufficient to deal
with them. But big data is often noisy, heterogeneous, irrelevant
and untrustworthy. As the speed of information growth exceeds
Moore’s Law at the beginning of this new century, excessive data
is making great troubles to human beings. However this data with
special attributes can’t be managed and processed by the current
traditional software system, which become a real problem. In this
paper was discussed some big data challenges and problems that
are faced by organizations. These challenges may relate
heterogeneity, scale, timelines, privacy and human collaboration.
Survey method was used as a theoretical solution framework.
Survey method consists of a questionnaires report. Questionnaires
report consists of all challenges and problems faced by
organizations. After knowing the problem and challenges of
organizations, a solution was given to organization to solve big
data challenges.
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...IJSCAI Journal
All types of machine automated systems are generating large amount of data in different forms like
statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we
are discussing issues, challenges, and application of these types of Big Data with the consideration of big
data dimensions. Here we are discussing social media data analytics, content based analytics, text data
analytics, audio, and video data analytics their issues and expected application areas. It will motivate
researchers to address these issues of storage, management, and retrieval of data known as Big Data. As
well as the usages of Big Data analytics in India is also highlighted
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
All types of machine automated systems are generating large amount of data in different forms likestatistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper weare discussing issues, challenges, and application of these types of Big Data with the consideration of bigdata dimensions. Here we are discussing social media data analytics, content based analytics, text dataanalytics, audio, and video data analytics their issues and expected application areas. It will motivateresearchers to address these issues of storage, management, and retrieval of data known as Big Data. Aswell as the usages of Big Data analytics in India is also highlighted.
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
All types of machine automated systems are generating large amount of data in different forms like
statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we
are discussing issues, challenges, and application of these types of Big Data with the consideration of big
data dimensions. Here we are discussing social media data analytics, content based analytics, text data
analytics, audio, and video data analytics their issues and expected application areas. It will motivate
researchers to address these issues of storage, management, and retrieval of data known as Big Data. As
well as the usages of Big Data analytics in India is also highlighted.
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
All types of machine automated systems are generating large amount of data in different forms like
statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we
are discussing issues, challenges, and application of these types of Big Data with the consideration of big
data dimensions. Here we are discussing social media data analytics, content based analytics, text data
analytics, audio, and video data analytics their issues and expected application areas. It will motivate
researchers to address these issues of storage, management, and retrieval of data known as Big Data. As
well as the usages of Big Data analytics in India is also highlighted.
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
All types of machine automated systems are generating large amount of data in different forms like
statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we
are discussing issues, challenges, and application of these types of Big Data with the consideration of big
data dimensions. Here we are discussing social media data analytics, content based analytics, text data
analytics, audio, and video data analytics their issues and expected application areas. It will motivate
researchers to address these issues of storage, management, and retrieval of data known as Big Data. As
well as the usages of Big Data analytics in India is also highlighted.
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...gerogepatton
All types of machine automated systems are generating large amount of data in different forms likestatistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper weare discussing issues, challenges, and application of these types of Big Data with the consideration of bigdata dimensions. Here we are discussing social media data analytics, content based analytics, text dataanalytics, audio, and video data analytics their issues and expected application areas. It will motivateresearchers to address these issues of storage, management, and retrieval of data known as Big Data. Aswell as the usages of Big Data analytics in India is also highlighted.
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...ijscai
All types of machine automated systems are generating large amount of data in different forms like
statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we
are discussing issues, challenges, and application of these types of Big Data with the consideration of big
data dimensions. Here we are discussing social media data analytics, content based analytics, text data
analytics, audio, and video data analytics their issues and expected application areas. It will motivate
researchers to address these issues of storage, management, and retrieval of data known as Big Data. As
well as the usages of Big Data analytics in India is also highlighted.
Introduction to Big Data: Definition, Characteristic Features, Big Data Applications, Big Data vs Traditional Data, Risks of Big Data, Structure of Big Data, Challenges of Conventional Systems, Web Data, Evolution of Analytic Scalability, Evolution of Analytic Processes, Tools and methods, Analysis vs Reporting, Modern Data Analytic Tools
This talk is an introduction to Data Science. It explains Data Science from two perspectives - as a profession and as a descipline. While covering the benefits of Data Science for business, It explaints how to get started for embracing data science in business.
Big data refers to the vast amount of structured and unstructured data that inundates organizations on a daily basis. This data comes from various sources such as social media, sensors, digital transactions, mobile devices, and more.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
2. Agenda
What is Data & Big Data
Sources of big data
Facts and Figures of Big Data
Processing Big Data – Hadoop
Hadoop Introduction: Hadoop’s
history and advantages
Hadoop Architecture in detail
Hadoop in industry
3. Data: Data is the collection of raw facts and figures. Data is
unprocessed, that is why data is called collection of raw facts
and figures.
Examples of Data:
Student data on admission form- bundle of admission forms
contains name, father’s name, address, course, photograph
etc.
Census Report, Data of citizens - During census, data of all
citizens like number of persons living in a home, literate or
illiterate, number of children, cast, religion etc.
4. Information: Processed data is called information. When
raw facts and figures are processed and arranged in some
proper order then they become information. Information has
proper meanings. Information is useful in decision-making.
Examples of information:
Student’s address labels - Stored data of students can be
used to print address labels of students. These address
labels are used to send any intimation / information to
students at their home addresses.
Census Report, Total Population - Census data is used to
get report/information about total population of a country
and literacy rate, total population of males, females,
children, aged persons, persons in different categories line
cast, religion, age groups etc.
5. Data to Information:
Ex: The data collected is in a survey report is: ‘HYD20M’
If we process the above data then we understand that code
information about a person as follows:
HYD is city name ‘Hyderabad’,
20 is age
M is to represent ‘MALE’.
7. International system of Units (SI)
Kilobyte KB 103
Megabyte MB 106
Gigabyte GB 109
Terabyte TB 1012
Petabyte PB 1015
Exabyte EB 1018
Zettabyte ZB 1021
Yottabyte YB 1024
Units of data: When dealing with big data, we consider numbers to represent
like megabytes, gigabytes, terabytes etc. Here is the system of units to
represent data.
9. Big Data is a huge collection of data as the name
refers “BIG DATA”. It can’t be processed by
traditional methods because the most of the data
generation is unstructured form.
According to Gartner:
Big data is huge-volume, fast-velocity,
and different variety information assets that
demand innovative platform for enhanced insights
and decision making.
10. Types of Big Data:
1. Structured Data:
Any data that can be stored, accessed and processed in
form of fixed format is termed as a 'structured' data.
2. Unstructured Data: Any data with unknown form or the
structure is classified as unstructured data. In addition to
the size being huge, un-structured data poses multiple
challenges in terms of its processing for deriving value out
of it.
3. Semi-Structured Data:
Semi-structured data can contain both the forms of data.
We can see semi-structured data as a structured in form
but it is actually not defined with e.g. a table definition in
relational DBMS. Example of semi-structured data is a
represented in XML file.
12. 3Vs of Big Data:
1. Volume – Volume refers to the ‘amount of
data’, which is growing day by day at a very fast
pace. The size of data generated by humans,
machines and their interactions on social media
itself is massive.
2. Variety: Variety refers to heterogeneous
sources and the nature of data, both structured
and unstructured.
3. Velocity: The term 'velocity' refers to the
speed of generation of data. How fast the data
generated and processed to meet the demands,
determines real potential in the data.
14. Sources of Big Data:
Sensor networks
Social media
Public web
Purchase records
Medical records
Airlines
Scientific research
So on……..
15. Facts and Figures of Big Data:
• The big data growth we’ve been witnessing is only
natural. We constantly generate data. On Google alone,
we submit 40,000 search queries per second. That
amounts to 1.2 trillion searches yearly!
• Each minute, 300 new hours of video show up on
YouTube. That’s why there’s more than 1 billion
gigabytes (1 exabyte) of data on its servers!
• People share more than 100 terabytes of data on
Facebook daily. Every minute, users send 31 million
messages and view 2.7 million videos.
• Big data usage statistics indicate people take about 80%
of photos on their smartphones. Considering that only
this year over 1.4 billion devices will be shipped
worldwide, we can only expect this percentage to grow.
• Smart devices (for example, fitness trackers, sensors,
Amazon Echo) produce 5 quintillion bytes of data daily.
17. Key Stats:
•The big data market reaching an estimated value
of $103 billion by 2023
•It’s estimated 97.2% of companies are starting to
invest in big data technology
•Every day, internet users create 2.5 quintillion
bytes of data
•IDC’s Digital Universe Study from 2012 found that
just 0.5% of data was actually being analyzed.
•It is estimated that by 2021 every person will
generate about 1.7 megabytes of data per second.
•Companies like Netflix leverages Big Data to save
US$1 billion per year on customer retention.
•80-90% of the data we generate today is
unstructured.
18. INTERESTING FACTS
According to studies, the human brain can
store about 2.5 petabytes of data.
We generate 2.5 quintillion bytes of data
daily.
By 2020, every person will generate 1.7
megabytes in just a second.
19. DATA
ANALYTICS
Gartner is predicting that companies that aren’t
investing heavily in analytics by the end of 2020
may not be in business in 2021. (It is assumed
small businesses, such as self-employed
handymen, gardeners, and many artists, are not
included in this prediction.)
20. DATA ANALYTICS
Data Analytics is the process of examining raw data (data sets) with
the purpose of drawing conclusions about that information,
increasingly with the aid of specialized systems and software.
Data Analytics involves applying an algorithmic or mechanical
process to derive insights. For example, running through a number
of data sets to look for meaningful correlations between each other.
It is used in a number of industries to allow the organizations and
companies to make better decisions as well as verify and disprove
existing theories or models.
The focus of Data Analytics lies in inference, which is the process of
deriving conclusions that are solely based on what the researcher
already knows.
21. Big Data and Analytics
• Surprisingly, 99.5% of collected data never gets used
or analysed. So much potential wasted!
• Less than 50% of the structured data collected from IoT is
used in decision making.
• Predictive analytics are becoming more and more crucial for
success. 79% of executives believe that failing to embrace
big data will lead to bankruptcy. This explains why 83% of
companies invest in big data projects.
• Fortune 1000 companies can gain more than $65
million additional net income, only by increasing their data
accessibility with 10%.
• Healthcare could also vastly benefit from big data analytics
adoption. As much as $300 billion can be saved yearly!
• Companies that harness big data’s full power
could increase their operation margins by up to 60%!
23. 1. DESCRIPTIVE ANALYTICS:
The simplest way to define descriptive analytics is that, it answers
the question “What has happened?”
This type of analytics, analyses the data coming in real-time and
historical data for insights on how to approach the future.
The main objective of descriptive analytics is to find out the reasons
behind precious success or failure in the past.
The ‘Past’ here, refers to any particular time in which an event had
occurred and this could be a month ago or even just a minute ago.
The vast majority of big data analytics used by organizations falls
into the category of descriptive analytics. 90% of organizations today
use descriptive analytics which is the most basic form of analytics.
24. 2. DIAGNOSTIC ANALYTICS
At this stage, historical data can be measured against other
data to answer the question of why something happened.
Companies go for diagnostic analytics, as it gives a deep insight
into a particular problem. At the same time, a company should
have detailed information at their disposal; otherwise data
collection may turn out to be individual for every issue and
time-consuming.
Eg: Let’s take another look at the examples from different
industries: a healthcare provider compares patients’ response
to a promotional campaign in different regions; a retailer drills
the sales down to subcategories.
25. 3. PREDICTIVE ANALYTICS
Predictive analytics tells what is likely to be happen. It uses the
findings of descriptive and diagnostic analytics to detect tendencies,
clusters and exceptions, and to predict future trends, which makes it a
valuable tool for forecasting.
Despite numerous advantages that predictive analytics brings, it is
essential to understand that forecasting is just an estimate, the
accuracy of which highly depends on data quality and stability of the
situation, so it requires a careful treatment and continuous
optimization.
Eg: A management team can weigh the risks of investing in their
company’s expansion based on cash flow analysis and
forecasting. Organizations like Walmart, Amazon and other retailers
leverage predictive analytics to identify trends in sales based on
purchase patterns of customers, forecasting customer behavior,
forecasting inventory levels, predicting what products customers are
likely to purchase together so that they can offer personalized
recommendations, predicting the amount of sales at the end of the
quarter or year.
26. 4. PRESCRIPTIVE ANALYTICS
The purpose of prescriptive analytics is to literally prescribe what action to take to
eliminate a future problem or take full advantage of a promising trend. It is a
combination of data, mathematical models and various business rules.
The data for prescriptive analytics can be both internal (within the organization)
and external (like social media data).
Besides, prescriptive analytics uses sophisticated tools and technologies, like
machine learning, business rules and algorithms, which make it sophisticated to
implement and manage. That is why, before deciding to adopt prescriptive
analytics, a company should compare required efforts vs. an expected added value.
Prescriptive analytics are comparatively complex in nature and many companies
are not yet using them in day-to-day business activities, as it becomes difficult to
manage. Large scale organizations use prescriptive analytics for scheduling the
inventory in the supply chain, optimizing production, etc. to optimize customer
experience.
An example of prescriptive analytics: a multinational company was able to identify
opportunities for repeat purchases based on customer analytics and sales history.
27.
28. NEED FOR BIG DATA ANALYTICS
The new benefits that big data analytics brings to the table,
however, are speed and efficiency. Whereas a few years ago a
business would have gathered information, run analytics and
unearthed information that could be used for future
decisions, today that business can identify insights for
immediate decisions. The ability to work faster – and stay
agile – gives organizations a competitive edge they didn’t
have before.
Big data analytics helps organizations harness their data and
use it to identify new opportunities. That, in turn, leads to
smarter business moves, more efficient operations, higher
profits and happier customers in the following ways:
29. NEED FOR BIG DATA ANALYTICS
Cost reduction: Big data technologies such as Hadoop and cloud-based analytics bring
significant cost advantages when it comes to storing large amounts of data – plus they
can identify more efficient ways of doing business.
Faster, better decision making: With the speed of Hadoop and in-memory analytics,
combined with the ability to analyze new sources of data, businesses are able to
analyze information immediately – and make decisions based on what they’ve learned.
New products and services: With the ability to gauge customer needs and satisfaction
through analytics comes the power to give customers what they want. Davenport
points out that with big data analytics, more companies are creating new products to
meet customers’ needs.
End Users Can Visualize Data: While the business intelligence software market is
relatively mature, a big data initiative is going to require next-level data visualization
tools, which present BI data in easy-to-read charts, graphs and slideshows. Due to the
vast quantities of data being examined, these applications must be able to offer
processing engines that let end users query and manipulate information quickly—even
in real time in some cases.
30. Use case: What is the need of Hadoop
Problem: An e-commerce site XYZ (having 100 million
users) wants to offer a gift voucher of 100$ to its top 10
customers who have spent the most in the previous year.
Moreover, they want to find the buying trend of these
customers so that company can suggest more items related
to them.
Issues:
Huge amount of unstructured data which needs to be
stored, processed and analyzed.
31. Solution:
Apache Hadoop is not only a storage system but is a platform
for data storage as well as processing.
Storage: This huge amount of data, Hadoop uses HDFS
(Hadoop Distributed File System) which uses commodity
hardware to form clusters and store data in a distributed
fashion. It works on Write once, read many times principle.
Processing: Map Reduce paradigm is applied to data
distributed over network to find the required output.
Analyze: Pig, Hive can be used to analyze the data.
Cost: Hadoop is open source so the cost is no more an issue.
32. INTRODUCTION TO HADOOP
Processing Big Data - Hadoop
Designed to answer the question: “How to
process big data with reasonable cost and time?”
Answer- We have a savior to deal with Big Data
challenges – its Hadoop.
33. Apache Hadoop
Hadoop is an open-source software framework used for
storing and processing Big Data in a distributed manner on
large clusters of commodity hardware. Hadoop is licensed
under Apache Software Foundation (ASF).
Created by Doug Cutting and Mike Cafarella in 2005.
Cutting named the program after his son’s toy elephant.
Hadoop with its distributed processing, handles large
volumes of structured and unstructured data more
efficiently than the traditional enterprise data warehouse.
Hadoop makes it possible to run applications on systems
with thousands of commodity hardware nodes, and to
handle thousands of terabytes of data. Organizations are
adopting Hadoop because it is an open source software
and can run on commodity hardware
35. HADOOP’S DEVELOPERS
Doug Cutting
2005: Doug Cutting and Michael J.
Cafarella developed Hadoop to support
distribution for the Nutch search engine
project.
The project was funded by Yahoo.
2006: Yahoo gave the project to Apache
Software Foundation.
37. SOME HADOOP MILESTONES
• 2008 - Hadoop Wins Terabyte Sort Benchmark (sorted 1 terabyte of
data in 209 seconds, compared to previous record of 297 seconds)
• 2009 - Avro and Chukwa became new members of Hadoop
family
• 2010 - Hadoop's Hbase, Hive and Pig subprojects completed, adding
more computational power to Hadoop framework
• 2011 - ZooKeeper Completed
• 2013 - Hadoop 1.1.2 and Hadoop 2.0.3 alpha.
- Ambari, Cassandra, Mahout have been added
39. FEATURES OF HADOOP
• Abstract and facilitate the storage and processing of
large and/or rapidly growing data sets
• Structured and non-structured data
• Simple programming models
• Suitable for Big Data Analysis
• High scalability and availability
• Use commodity (cheap!) hardware with little redundancy
• Fault-tolerance
• Move computation rather than data
44. HADOOP’S ARCHITECTURE
• Hadoop Distributed Filesystem
• Tailored to needs of MapReduce
• Targeted towards many reads of filestreams
• Writes are more costly
• High degree of data replication (3x by default)
• No need for RAID on normal nodes
• Large blocksize (64MB)
• Location awareness of DataNodes in network
45. HADOOP’S ARCHITECTURE
NameNode:
• Stores metadata for the files, like the directory structure of a
typical FS.
• The server holding the NameNode instance is quite crucial,
as there is only one.
• Transaction log for file deletes/adds, etc. Does not use
transactions for whole blocks or file-streams, only metadata.
• Handles creation of more replica blocks when necessary
after a DataNode failure
46. HADOOP’S ARCHITECTURE
DataNode:
• Stores the actual data in HDFS
• Can run on any underlying filesystem (ext3/4, NTFS, etc)
• Notifies NameNode of what blocks it has
• NameNode replicates blocks 2x in local rack, 1x elsewhere
49. HADOOP’S ARCHITECTURE
MapReduce Engine:
• JobTracker & TaskTracker
• JobTracker splits up data into smaller tasks(“Map”) and
sends it to the TaskTracker process in each node
• TaskTracker reports back to the JobTracker node and
reports on job progress, sends data (“Reduce”) or requests
new jobs
50. HADOOP’S ARCHITECTURE
• None of these components are necessarily limited to using
HDFS
• Many other distributed file-systems with quite different
architectures work
• Many other software packages besides Hadoop's
MapReduce platform make use of HDFS
51. WHY USE HADOOP?
Need to process Multi Petabyte Datasets
Data may not have strict schema
Expensive to build reliability in each application
Nodes fails everyday
Need common infrastructure
Very Large Distributed File System
Assumes Commodity Hardware
Optimized for Batch Processing
Runs on heterogeneous OS