Big data gaurav
Upcoming SlideShare
Loading in...5
×
 

Big data gaurav

on

  • 429 views

Big Data Class 1

Big Data Class 1

Statistics

Views

Total Views
429
Views on SlideShare
328
Embed Views
101

Actions

Likes
0
Downloads
43
Comments
0

3 Embeds 101

http://www.jigsawacademy.com 98
http://www.jigsawacademy.in 2
http://jigsawacademy.in 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts. <br />  
  • Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  • Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  • Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  • Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  • Class 1 gives you an overview of Big Data. It introduces the concept and gives a broad overview of the business applications of Big Data. In addition, the module gives a broad understanding of the technology infrastructure that is required to store, handle, and manage Big Data. Finally, the module delves a little deeper into the Hadoop ecosystem as well as the MapReduce framework and explains how these popular frameworks support Big Data management. We’ll understand this a lot more in detail as we explore these topics.
  • The 1st topic we will discuss today is what Big Data, what are its advantages and sources?
  • We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
  • We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
  • We will be dividing class 1 into three broad categories – history & evolution of Big Data, Structuring of Big Data & the elements that comprise it, Big Data application in business analytics and lastly the career opportunities associated with studying Big Data.
  • If you think of the world around you, there is an enormous amount of data generated, captured, and transferred through various media—within seconds. This data may come from a personal computer, social networking sites, transaction or communication system of an organization, ATMs, and multiple other channels.
  • Some reports have recorded that in 2002, there was an estimated 5 exabytes of online data in existence. Each Exabyte is a massive 1000000 terabytes or TBs. By 2009, that number had risen to 281 exabytes—a 56-times increase—and this number has multiplied exponentially post 2009. This data is created in the form of posts, pictures, videos, and weather information.
  • This accumulation results in a continuous generation of an enormous volume of data, which if analyzed intelligently, can be of immense value, as it can give us a variety of critical information to make smarter decisions. In other words, careful analysis can transform this data to information, and information to insight.
  • The need to analyze and offer this critical data in a systematic and comprehensive manner leads to the rise of a much discussed term … and the pivot of this course —Big Data.
  • Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
  • Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
  • Big Data is a pool of large-sized datasets to capture, store, search, share, transfer, analyze, and visualize related information or data within an acceptable elapsed time.
  • Big Data assimilation is the process of examining large amounts of data to gain insight.
  • Big Data assimilation is the process of examining large amounts of data to gain insight.
  • As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped. <br />
  • As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped. <br />
  • As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped. <br />
  • As data continues to grow, there is a need for the data to be organized and made available so that it can be used as an information source. Earlier due to lack of access and the means to process data, the potential of Big Data remained mostly untapped. <br />
  • There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them. <br />   <br /> = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently. <br />   <br /> = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes. <br />   <br /> = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
  • There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them. <br />   <br /> = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently. <br />   <br /> = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes. <br />   <br /> = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
  • There are 3 main factors to consider when talking about Big Data, so lets take a quick look at each of them. <br />   <br /> = It’s a new kind of data. It’s a challenge since it requires leveraging different systems differently. <br />   <br /> = It is classified in terms of Volume / Variety and Velocity. Volume refers to the amount of data, whereas Variety refers to type – internal or external or behavioural or social. The third classification Velocity refers to its assimilation … how near or real-time it is. We will look at these concepts in more detail in later classes. <br />   <br /> = Lastly, Big Data is largely unstructured and qualitative in nature – hence giving it its name – BIG data.
  • Big Data is a new kind of challenge because besides its enormous implications, its significance is constantly increasing with the growth in data. <br /> Today, Big Data can mean anything from a single terabyte to a petabyte or an Exabyte of data. <br />
  • The systematic study of Big Data across sectors and geographies can lead to results such as: <br /> Understanding target customers better <br /> Cutting down of expenditures in the healthcare sector <br /> Increase in operating margins for the retail sector <br /> Several billions of dollars being saved by improvements in operational efficiency
  • The systematic study of Big Data across sectors and geographies can lead to results such as: <br /> Understanding target customers better <br /> Cutting down of expenditures in the healthcare sector <br /> Increase in operating margins for the retail sector <br /> Several billions of dollars being saved by improvements in operational efficiency
  • The systematic study of Big Data across sectors and geographies can lead to results such as: <br /> Understanding target customers better <br /> Cutting down of expenditures in the healthcare sector <br /> Increase in operating margins for the retail sector <br /> Several billions of dollars being saved by improvements in operational efficiency
  • The systematic study of Big Data across sectors and geographies can lead to results such as: <br /> Understanding target customers better <br /> Cutting down of expenditures in the healthcare sector <br /> Increase in operating margins for the retail sector <br /> Several billions of dollars being saved by improvements in operational efficiency
  • Across industries, data along with analytics can transform major business processes in various ways such as: <br />   <br /> Improving performance in sports by analyzing and tracking performance and behavior <br /> Improving science and research <br /> Improving security and law enforcement by enabling better monitoring <br /> Improving financial trading by making more informed decisions
  • Across industries, data along with analytics can transform major business processes in various ways such as: <br />   <br /> Improving performance in sports by analyzing and tracking performance and behavior <br /> Improving science and research <br /> Improving security and law enforcement by enabling better monitoring <br /> Improving financial trading by making more informed decisions
  • Across industries, data along with analytics can transform major business processes in various ways such as: <br />   <br /> Improving performance in sports by analyzing and tracking performance and behavior <br /> Improving science and research <br /> Improving security and law enforcement by enabling better monitoring <br /> Improving financial trading by making more informed decisions
  • Across industries, data along with analytics can transform major business processes in various ways such as: <br />   <br /> Improving performance in sports by analyzing and tracking performance and behavior <br /> Improving science and research <br /> Improving security and law enforcement by enabling better monitoring <br /> Improving financial trading by making more informed decisions
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Across organizations, the right analysis of available data can transform major business processes in various ways like in …
  • Google applied its massive data-collecting power to raise warnings for the flu plagues approximately two weeks in advance of the existing public services. To do this, Google monitored millions of users’ health-tracking behaviors, and followed a cluster of queries on themes such as symptoms about flu, congestion in chest, and incidences of buying a thermometer. Google analyzed this collected data and generated consolidated results that revealed strong indications of flu levels across America.
  • Besides the more obvious reference to volume, Big Data has also been called so because of the various types and sources of data. Lets look at some of the source types of data and their usage. <br />   <br /> Think of social data from sources like Facebook or Twitter, and how much it can tell us about the people using them, and their behavioral patterns. Or data like GPS outputs which can track our movements across the globe – that’s machine data, or even transactional data from when we order a new pair of shoes online, or when we buy pizza.
  • The need for Big Data is evident. If leaders and economies want exemplary growth and wish to generate value for all their stakeholders, Big Data has to be embraced and used extensively.
  • Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts. <br />  
  • The 1st topic we will discuss today is what Big Data, what are its advantages and sources?
  • Now we will look at the structuring and elements of Big Data.
  • Now we will look at the structuring and elements of Big Data.
  • Now we will look at the structuring and elements of Big Data.
  • In your daily life, you may have come across questions like: <br />
  • In your daily life, you may have come across questions like: <br />
  • Today, solutions to such questions can be found by computers. Recommendation systems can analyze and structure a large amount of data specifically for you, on the basis of what you searched, what you looked at, and for how long—thus scanning and presenting you with customized information as per your behavior and habits. <br /> This is called structuring of data. This is what goes into play when your favorite shopping site presents you with a fantastically picked set of recommendations when you log in. It is when technology is used to study and analyze the data to understand user behavior, requirements, and preferences to make personalized recommendations for every individual. <br />
  • Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis. <br />
  • Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis. <br />
  • Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis. <br />
  • Data that comes from multiple sources—such as databases, Enterprise Resource Planning (ERP) systems, weblogs, chat history, and GPS maps—varies in its format. However, different formats of data need to be made consistent and clear to be used for analysis. <br />
  • Data acquired from various sources can be categorized primarily into the following types of sources: <br />   <br /> Internal sources, such as organizational or enterprise data which can be used to support the business operations of an organization. <br /> And External sources, such as social data from the Internet or the government which can be analyzed to formulate policy and understand the market, or the environment or technology.
  • Have a look at the table on your screen. You’ll see that sources can be internal or external, but they usually provide 3 kinds of data … <br /> <br /> Its when all these 3 data comes together that we can actually visualize what is Big Data. You’ll note that typically unstructured data is larger in volume than structured and semi-structured data. Lets take a closer look at each of these data types.
  • Structured data can be defined as a set of data with a defined repeating pattern. This pattern makes it easier for any program to sort, read, and process it. Obviously, processing of structured data is much faster than the processing of data without specific repeating patterns.
  • Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. <br />   <br /> Is organized data in a predefined format <br /> Is data that resides in fixed fields within a record or file <br /> Is formatted data that has entities and their attributes mapped <br /> Is used to query and report against predetermined data types
  • Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. <br />   <br /> Is organized data in a predefined format <br /> Is data that resides in fixed fields within a record or file <br /> Is formatted data that has entities and their attributes mapped <br /> Is used to query and report against predetermined data types
  • Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. <br />   <br /> Is organized data in a predefined format <br /> Is data that resides in fixed fields within a record or file <br /> Is formatted data that has entities and their attributes mapped <br /> Is used to query and report against predetermined data types
  • Lets take a quick look at a sample of structured data, in which the attribute data for every customer is stored with individual data points in the defined fields. From this, lets try derive a few features of structured data. <br />   <br /> Is organized data in a predefined format <br /> Is data that resides in fixed fields within a record or file <br /> Is formatted data that has entities and their attributes mapped <br /> Is used to query and report against predetermined data types
  • Unstructured Data is a set of data with a complex structure that might or might not have a repeating pattern. It: <br /> Consists typically of metadata <br /> Comprises inconsistent data <br /> Consists of data in different formats such as e-mails, text, audio, video, or image files <br />
  • Some sources for unstructured data include: <br /> Text Internal to an Organization: Think of documents, logs, emails etc. <br /> Data from Social Media <br /> And Mobile Data
  • Some sources for unstructured data include: <br /> Text Internal to an Organization: Think of documents, logs, emails etc. <br /> Data from Social Media <br /> And Mobile Data
  • Some sources for unstructured data include: <br /> Text Internal to an Organization: Think of documents, logs, emails etc. <br /> Data from Social Media <br /> And Mobile Data
  • A fantastic example of the usage of unstructured data is in supermarkets where unstructured visual information from CCTV footage – like where customers halt, their behavior during a bottleneck, how they navigate through a store … is combined with structured data comprising bill counters, products to arrive at a complete data-driven picture of customer behavior. This can be used to create a better shopping experience for the customer, and of course, generate more sales for the store.
  • About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
  • About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
  • About 80 percent of enterprise data consists of unstructured content. Unstructured systems typically have little or no predetermined form and provide users with a wide scope to structure data according to their choice. So it becomes the weapon of choice to gain considerable competitive corporate advantage, and to also gain a more holistic complete picture of future prospects.
  • The table on your screen shows the result of a survey conducted to ascertain the challenges associated with unstructured data. The survey reveals that the volume of data is the biggest challenge followed by the infrastructure requirement to manage this volume. Managing unstructured data is also difficult because it is not easy to identify it.
  • Semi-structured data, also known as schema-less or self-describing structure, refers to a form of structured data that contains tags or markup elements in order to separate semantic elements and generate hierarchies of records and fields in the given data. Such type of data does not follow proper structure of data models as in relation databases.
  • To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
  • To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
  • To be organized, semi-structured data should be fed electronically from database systems, file systems, and through data exchange formats including scientific data and XML or eXtensible Markup Language. XML enables data to have an elaborate and intricate structure that is significantly richer and comparatively complex.
  • An example of semi-structured data is shown on your screen, which indicates that entities that belong to a same class can have different attributes even if they are grouped together. <br /> Now that we have examined the way data arrives and is presented, let us move on to the elements that characterize this data.
  • Big Data primarily consists of the following three elements: <br /> Volume <br /> Velocity <br /> Variety <br /> Lets now take a more detailed look at each of these elements. <br />
  • Volume is the amount of data generated by organizations or individuals. Today, the volume of data is approaching exabytes. Some experts predict the volume of data to reach zettabytes in the coming years. <br /> Think about the numbers – Google Inc processes around 20 petabytes in a single day! While Twitter feeds generate around 80 MB per second!
  • Velocity describes the rate at which data is generated, captured, and shared. Enterprises can capitalize on data only if it is captured and shared in real-time. <br />
  • Existing systems such as CRM and ERP face the problem associated with the speed of data, which adds up continuously, and cannot be attended quickly. These systems are able to attend data in batches every few hours; however, the time lag causes the data to lose its importance, and, in the meantime, new data is being constantly generated. Ebay for example, analyzes 5 million transactions per day in real-time to address frauds arising from the usage of Paypal!
  • A pool of data from social, machine, and mobile sources continues to add new data types and varieties of data to traditional transactional data; thus, data is no longer organized in any predefined form and comprises new types of data, including weblog data, machine data, mobile data, sensor data, social data, and text data. <br />
  • In this section we will be understanding Big Data Application in business analytics and also the career prospects in Big Data.
  • Now we will study in detail the application of Big Data in Business Analytics.
  • Data, which is available in abundance, can be streamlined and exploited for growth and expansion in technology as well as businesses. When data is analyzed successfully, it can be the answer to an important question: how can businesses acquire more customers and gain business insight? The key lies in being able to source, link, understand, and analyze data. <br />
  • Take a look at this table highlighting different business areas that have benefited by using Big Data and their proportion.
  • Lets now take a quick look at businesses and industries that are affected by and benefit from Big Data Analytic. Sectors, such as computer and electronic products, and IT have experienced tremendous growth in sales, while sectors, such as finance, insurance, and government have developed accurate assessment techniques. <br />
  • Big Data has transformed transportation by providing improved traffic information and autonomous features. <br />
  • Big Data has transformed the modern day education process through innovative approaches for teachers to analyze the students’ ability to comprehend and thus, impart education effectively in accordance with each student’s needs. <br />
  • The travel industry, too, is using Big Data to conduct business. Most airlines are working toward customer satisfaction by doing more to remember personal preferences. Such customization goes way beyond the mileage rewards—based loyalty programs. Airline companies also apply analytics to pricing, inventory, and advertising to improve customer experiences, which leads to more customer satisfaction, and hence, more business. A similar story can be experienced in the hotel industry as well.
  • The study and analysis of available data is allowing governments to make informed decisions for fraud management, discover unknown threats, ensure security of global supply chain by monitoring global cargo traffic, use budgets more judiciously, analyze risks, and lots more.
  • In healthcare, physicians can make use of Big Data to determine the best clinical protocols that will ensure the best health outcome for patients.
  • Now that you know that Big Data is really BIG in today’s world, you can well understand that so are the opportunities associated with it.
  • Qualified and experienced Big Data professionals must have a blend of technical expertise, creative and analytical thinking, and communication skills, to be able to effectively collate, clean, analyze, and present information extracted from Big Data. <br /> Now that you know that Big Data is really BIG in today’s world, you can well understand that so are the opportunities associated with it.
  • Most jobs in Big Data are from companies that can be categorized into the following four broad buckets: <br /> 1. Big Data technology drivers, e.g., Google <br /> 2. Big Data product companies, e.g., Oracle <br /> 3. Big Data services companies, e.g., EMC <br /> 4. Big Data analytics companies, e.g., Splunk <br /> <br />
  • The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer. <br />
  • The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer. <br />
  • The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer. <br />
  • The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer. <br />
  • The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer. <br />
  • The flowchart should give you a fairly accurate idea of the step-by-step progress, you can expect from a Big Data certification program – either as an analyst or as a developer. <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • A Big Data analyst should possess the following technical skills: <br /> <br /> Understanding of Hadoop, Hive, and MapReduce <br /> <br /> Knowledge of natural language processing <br /> <br /> Knowledge of statistical analysis and analytical tools <br /> <br /> Knowledge of conceptual and predictive modeling <br />
  • Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. <br /> The preferred soft skills requirements for a big data professional are:
  • Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. <br /> The preferred soft skills requirements for a big data professional are:
  • Organisations look for professionals who possess good logical & analytical skill, with good communication skills and an affinity toward strategic business thinking. <br /> The preferred soft skills requirements for a big data professional are:
  • Most organizations today consider data and information to be their most valuable and differentiated asset, next to only their employees. <br /> By analyzing this data effectively, organizations worldwide are now finding new ways to compete and emerge as leaders in their fields, to improve decision-making, and to enhance performance. At the same time with the volume and variety of data also increasing at an immense speed everyday, the global phenomena of using Big Data to gain business value and competitive advantage will only continue to grow.
  • To sum it up by analyzing data effectively, organizations worldwide are now finding new ways to compete and emerge as leaders in their fields, to improve decision-making, and to enhance performance. At the same time with the volume and variety of data also increasing at an immense speed everyday, the global phenomena of using Big Data to gain business value and competitive advantage will only continue to grow.
  • In this class, we’ll look at the significance of social network data in the business context. <br />   <br /> The previous class gave you a broad idea about “Big Data” and how it affects our lives. In a sense, the data is only as good as the insights provided by it.
  • Human beings are social animals and cannot live in isolation. A human being gains knowledge, learns to communicate and think, work and play, by living in a social environment.
  • Today, socialization is not restricted to meeting and communicating with others in person. The usage of mobile phones and the Internet has made communication across the globe fast and easy. These also make socialization and the sharing of information both affordable and easily accessible.
  • Twitter, Facebook, and LinkedIn are currently some of the most popular social networking sites. These comprise the social media. This session analyzes the Big Data generated by social media and its implications on various industries.
  •   <br />  In this topic we will understand the: <br /> - Significance of social network data <br /> Financial Fraud and Big Data <br /> Fraud detection in insurance <br /> And use in retail industry
  •   <br />  In this topic we will understand the: <br /> - Significance of social network data <br /> Financial Fraud and Big Data <br /> Fraud detection in insurance <br /> And use in retail industry
  •   <br />  In this topic we will understand the: <br /> - Significance of social network data <br /> Financial Fraud and Big Data <br /> Fraud detection in insurance <br /> And use in retail industry
  •   <br />  In this topic we will understand the: <br /> - Significance of social network data <br /> Financial Fraud and Big Data <br /> Fraud detection in insurance <br /> And use in retail industry
  • In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is? <br />
  • In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is? <br />
  • In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is? <br />
  • In this session, we’ll look at what is Social Network Data? And what is Social Network Analysis? We will also address what are the uses of Social Network Data Analysis? And lastly what Sentiment Analysis is? <br />
  • Social network data is the data generated when people socialize or communicate through social media.
  • As you can see, on social networking sites, numerous people constantly add and update their comments, likes, preferences, sentiments, and feelings and thereby generate huge data. This huge data, when mined and analyzed, throws up collective views and trends with regard to the likes and dislikes, wants and preferences of a large population.
  • This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
  • This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
  • This collective data can also be segregated and analyzed in terms of various groups of people, such as people belonging to various age groups, genders, and locations around the world. This information enables organizations to design and tailor products and services that people want. Such is the importance of social network data.
  • Have a look at this image
  • Now let’s look at what is Social Network Analysis? <br />   <br />  
  • Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
  • Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
  • Social Network Analysis or SNA refers to the analysis of the data generated in social networks. As the data used is massive, it leads to a Big Data situation.
  • Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
  • Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
  • Let’s consider an example of a mobile network operator to understand the value of social network data. The complete set of cell phone calls or text message records captured by an MNO is huge data. Such data is routinely used for a variety of purposes – to possibly tailor offers to the customer, or provide relevant services that the customer routinely uses.
  • In this example, we will see how data analysis is going up a notch by looking into several degrees of association instead of just one. That’s how social network analysis can make a simple data source into a Big Data source.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • This figure represents a caller’s social network. It is possible to go as many layers deep as the analysis can handle to get the complete picture of a social network. The need to go deeper from customer to customer and call to call for several layers makes the volume of data massive. It also increases the difficulty in analyzing it, particularly when it comes to using traditional methods. <br />   <br /> It works the same way within social networking sites. When analyzing a member of a social network, it isn’t that hard to identify how many connections a member has, how often messages are posted, and other standard metrics.
  • However, knowing how wide a network a member has when including friends, friends of friends, and friends of friends of friend, is a lot more work or a Big Data problem.
  • What are the uses of Social Network Data Analysis? <br />   <br />  
  • By using social network data analysis, decision-making can be improved in the following areas: <br /> Business Intelligence <br /> Marketing <br /> Product Design and Development <br /> <br /> Lets look at each of these in a little more detail.
  • By using social network data analysis, decision-making can be improved in the following areas: <br /> Business Intelligence <br /> Marketing <br /> Product Design and Development <br /> <br /> Lets look at each of these in a little more detail.
  • By using social network data analysis, decision-making can be improved in the following areas: <br /> Business Intelligence <br /> Marketing <br /> Product Design and Development <br /> <br /> Lets look at each of these in a little more detail.
  • Let’s look at how it helps in Business Intelligence, in detail <br /> You can analyze data generated from social networks to get some high value business insights.
  • Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
  • Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
  • Social Customer Relationship Management (CRM) is a buzzword these days. This analysis is capable of changing the perspective with which organizations value their customers. Rather than considering a single customer’s value, now it is possible to evaluate the value of a customer’s overall network.
  • Let’s consider the example of a mobile service provider which has a relatively low-value customer as a subscriber. The customer has a basic call plan, which does not generate any additional revenue. The customer is barely profitable. The service provider would traditionally have valued this customer on the basis of his or her individual account and hence may not have been too worried if the customer had wanted to leave.
  • With social network analysis, however, it is possible to identify that the same customer can influence the people in his or her network who are heavy users and who have a wide network of friends. This may persuade the company to make an altogether different business decision and value the customer more. <br />   <br /> This may also be because studies have shown that once a member of a calling circle leaves, others are most likely to follow the first and leave. Using social network analysis, it is possible to understand the potential value that the customers can influence, rather than only the revenue they directly generate. This gives a completely different perspective of how the customer needs to be handled.
  • Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
  • Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
  • Law enforcement and anti-terrorism efforts also leverage social network analysis today. It is possible to recognize individuals who are connected, directly or indirectly, to known trouble groups or persons. Analysis of this type is often referred to as link analysis.
  • So, from the above mentioned examples, we can infer the following business insights: <br />   <br /> Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. <br /> <br /> Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. <br /> <br /> Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
  • So, from the above mentioned examples, we can infer the following business insights: <br />   <br /> Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. <br /> <br /> Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. <br /> <br /> Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
  • So, from the above mentioned examples, we can infer the following business insights: <br />   <br /> Social network data analysis can help provide new contexts in which decisions are data driven and not opinion driven. <br /> <br /> Big Data analysis allows organizations to shift goals from maximizing individual account profitability to maximizing the profitability of the customer’s network. <br /> <br /> Big Data helps organizations to identify highly connected customers and assists in when, where, and how to align and focus marketing efforts in building a better brand image.
  • It enables organizations to lure highly connected customers with free trials and solicit their feedback for the betterment of their products and services. <br /> <br /> It assists organizations to encourage internal customers to become more active with feedback and opinions on the product or services
  • It enables organizations to lure highly connected customers with free trials and solicit their feedback for the betterment of their products and services. <br /> <br /> It assists organizations to encourage internal customers to become more active with feedback and opinions on the product or services
  • Let’s look at how social network data analysis can improve decision-making in marketing.
  • Today’s consumers have changed. They no longer read newspapers end-to-end. They do not see fast-forward TV commercials and junk unsolicited e-mail because they have many choices and new options that fit their digital lifestyle better. Consumers can now choose the marketing messages they wish to receive—when, where, and from whom.
  • In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  • In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  • In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  • In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  • In today’s competitive scenario, marketers deliver what consumers want through relevant interactive communication across digital power channels: e-mail, mobile, social, and the Web.
  • These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them. <br />  
  • These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them. <br />  
  • These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them. <br />  
  • These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them. <br />  
  • These channels, in turn, generate the social data required to provide insights into a target audience’s brand communication preferences, the tone of voice they use, their interests, the other brands they discuss, and plenty of other data that can help a brand tailor consumer communication to eke the most out of them. <br />  
  •  Social network analysis of this data has a widespread use in marketing in various interesting ways.
  • Lets look at how retail giant Walmart is using social media to undserstand their customers better. Walmart recently acquired a a social media analytics company named Kosmix and created Walmart Labs, a division that analyzes media communication to understand retail trends. One of the key responsibilities of this division is to monitor public domain conversations and then position Walmart products accordingly.
  • Affiliate marketing is a reward-based marketing structure, where an affiliated company uses its own market effort to trigger off customers for another company and in turn, is rewarded by the benefited company. For example Brandlove app. Today, one would be hard-pressed to find a major brand that does not have a thriving affiliate program.
  • Let’s now look at how social network data analysis can improve decision-making in product design & development.
  • Millions of status updates, blog posts, photographs, and videos are shared every second.
  • To be successful, organizations not only need to identify the information relevant to their company, products, and services but should be able to dissect, comprehend, and respond to the relevant information in real time and on a continuous basis.
  • A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
  • A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
  • A system that is able to represent a sentiment as data with increased levels of accuracy provides the client a way to access information on a social platform. To measure sentiment more closely is of great value in designing a product or service. It is important for brands to be able to understand the demographic information they receive.
  • Let’s now look at what is Sentiment Analysis? <br />
  • Sentiment analysis is defined as a computer programming technique to analyze human emotions, attitudes, and views across popular social networks including Facebook, Twitter, and blogs. The technique requires analytic skill as well as computing techniques.
  • By listening to what consumers want, by understanding where the gap in the offering is, and so on, organizations can make the right decisions in the direction of their product development and offerings. In this way, social network data can help organizations improve product development and services, also making sure consumers ultimately get the products and services they want.
  • However, this technique is still evolving, and the full potential of sentiment analysis is yet to be explored by marketers and other business professionals. <br />
  • There’s also the issue of judgment. Think of a company relying purely on the number of likes and followers they have to estimate their popularity. Deeper studies could possibly show that most of the trends are negative – yet it may all go towards somehow creating a false social media impression about the company.
  • American airlines has been ranked one of the most disliked companies in the USA. But their social media presence & its studies have a different story to tell. The airlines has about 346,259 followers on Twitter and 273,591 ‘likes’ on Facebook. Deep studies indicate online conversations about the company that are negative, which indicates that it is one of the most disliked airlines. Hence sentiment & emotive data should be given more importance rather than numbers that come from the “followers” and “likes”.
  • Under this topic we have discussed in detail about <br /> We have looked at Social network Data and its analysis. We have addressed the uses of Social Network Data Analysis and how Sentiment analysis is helpful in making better business decisions. <br />
  • In this class, we’ll look at the significance of social network data in the business context. <br />   <br /> The previous class gave you a broad idea about “Big Data” and how it affects our lives. In a sense, the data is only as good as the insights provided by it; hence, it is important to understand how the data is actually.
  • Now we will look at Financial Fraud and Big Data
  • Frauds occur frequently in banks and other financial institutions. These financial institutions send educative e-mails and communication on how to prevent such frauds and not be a party to it.
  • Financial frauds are even higher in the online retail sector. In such frauds cases, online retailers, such as Amazon, eBay, and Groupon, tend to incur huge expenses and losses.
  • Following are the most common financial frauds that impact online retailers: <br />   <br /> Credit Card Frauds: This is a widespread and frequent fraud. The online retailer does not see the user of the card, and hence cannot validate the ownership of the card. That a stolen or even fake card was used in a transaction is also quite likely. Despite the several checks in the process of online transactions all the loopholes in the system are not plugged. <br /> <br /> Exchange or Return Policy Fraud: Every online retailer has a policy on exchange and return, and this provides a strong area for fraudsters to function. <br /> <br /> Personal Information Fraud: Here, the customer’s login information is stolen, and thereafter the fraudster logs-in, goes about completing the entire sale transaction, and then changes the address for delivery to a different location.
  • The only way to prevent these frauds is to understand customers‘ ordering patterns and keep a vigil out for red flags. <br />
  • Big Data can be intelligently used not just to educate online retailers but also to manage and prevent fraud and losses in their business.
  • LETS LOOK AT HOW THIS IS POSSIBLE <br /> Analyzing data to understand various patterns of the fraud was one of the many preventive methods, but it worked only as long as the sample size was small. This size could not be increased because that required huge investments in time and money. With Big Data techniques, however, this challenge can now be overcome. <br />
  • Big Data analytics can … <br /> Run a check on all the data to identify any fraudulent ones. <br /> Identify any new ways of fraud and then keep adding them to a set of fraud-prevention checks. <br /> It doesn’t impede customers with unnecessary polices and governance structures. <br />
  • Fraud Detection in Real Time <br /> To detect fraud in real time, Big Data uses a real-time comparison of live transitions with various sources of data to authenticate transactions online. For example, if there were to be a transaction online, Big Data would immediately enable comparison between the incoming IP address and the geo-data from the customer’s smart phone apps. A match would authenticate the transaction. <br />
  • Big Data can also comb through historical data and indicate fraud patterns that are later used to create checks to prevent real-time fraud.
  • Retailers use real-time analysis effectively by knowing when exactly the items were delivered to customers. High-value items have attached sensors that can transmit their location. When such items are delivered to customers, retailers process the streaming data from these sensors and thus prevent frauds.
  • Visually Analyzing Frauds <br /> Big Data can facilitate drawing maps and graphs that create comparisons, which are then used to make decisions and create effective systems that are accurately placed to block fraud. An analysis in the graphical form, for example, can help identify the regions, customers, and the products that have a higher fraud rate. <br /> Big Data can even show comparisons between products and regions, and so on, which alerts the retailer on where a greater probability of fraud exists.
  • Let’s assume that an insurance company wants to improve its ability to make decisions in real time when processing a new claim, thereby reducing the claim cycle time. On the other hand, the company incurs a steady increase in the cost of litigation and fraudulent claims. The company has policies and procedures to help underwriters evaluate fraudulent claims; however, the underwriters do not have the required data at the right time to make the necessary decisions, further delaying the processing time.
  • Within this context, the company implements a Big Data analytical platform, which uses data from social media to provide a real-time view. This enables a call center agent to diagnose the patterns of behaviors and the relationships among other claimants when the customer calls in for a claim for the first time, and leaves a note for the underwriters to go through.
  • In some cases, social media could also provide great triggers to identify fraud; for example, a customer might indicate that his or her car was destroyed in a flood, but the documentation from the social media feed shows that the car was actually in another city on the day the flood occurred. These glaring discrepancies reflect fraud.
  • Insurance frauds have a huge cost implication on an organization, which is why organizations prefer using Big Data analytics and other advanced technologies to handle this issue. This also has a positive impact on customers as losses are transferred as higher premiums to customers.
  • Post the implementation of Big Data analytics platform, organizations are now able to analyze complex information and accident scenarios in minutes rather than days or months.
  • Fraud Detection Methods <br />   <br /> Traditionally, insurance companies have been using statistical models to identify fraudulent claims. These models have many limitations and can prevent fraud only to a certain extent. This section examines these limitations and how Big Data can overcome them. <br />   <br /> Insurance companies typically use small samples of data to analyze, which leads to one or more frauds going undetected. This method relies on the previously recorded fraud cases; therefore, every time a fraud based on new technique occurs, insurance companies have to bear the consequences and the losses for the first time. <br /> The traditional method of identifying frauds works in independent silos. It is not capable of handling various sources of information from different channels and different functions in an integrated way. Big Data analytics, on the other hand, can handle this kind of challenge.
  • Public data like bank statements, legal judgments, criminal records and medical bills can provide useful means of predictive analysis in order to avoid frauds. <br />   <br /> To get the most effective predictive value from such public data, business organizations integrate their internal data with third party data. This integration helps in investigating and restricting fraudulent activities.
  • Social Network Analysis <br />   <br /> Earlier, we learned about social network analysis (SNA) and how Big Data can be used to create visibility into blind spots for businesses. SNA is an innovative and effective way to identify and detect frauds.
  • Consider an example. Assume in an accident, all people involved exchanged their addresses and phone numbers and have given them to the insurer. Among them, if the address given by one of the accident victims reveals several claims or the vehicle is identified to have been involved in other claims as well, this will automatically indicate chances of fraudulent claims. The ability to source this information can result in catching such fraudulent claims faster.
  • The SNA tool uses a mix of analytical methods. This mixed approach includes statistical methods, pattern analysis, and link analysis to uncover large amounts of data to show relationships.
  • When link analysis is used in fraud detection, one looks for clusters of data and how those data clusters are linked to other data clusters. As already mentioned, public records are various data sources that can be integrated into a model. Using this approach of integrating various data sources into a model, the insurer can rate claims.
  • If the rating is high, it indicates that the claim is fraudulent. This might be because of a known bad address, or a suspicious provider, or the vehicle was involved in many accidents with multiple carriers.
  • Before implementing SNA, however, organizations should consider the following questions carefully: <br /> 1. How fast does data arrive? <br />
  • How much of unrequired data is there when it arrives?
  • How deep should the analysis be before determining the best accurate results?
  • What type of user interface components need to be included on the SNA dashboard?
  • Next is the step-by-step SNA method to detect fraud: <br /> <br /> 1. The data, both structured and unstructured, from various sources is fed into the ETL (Extract, Transform, and Load) tool. This data is then transformed and loaded into a data warehouse. <br /> 2. The analytics team uses information from various sources, scores the risk of fraud and ranks the likelihood of fraud. The information used can come from varied sources such as a prior belief or a previous relationship, the number of rejected claims etc. <br /> 3. Several Big Data technologies including text mining, sentiment analysis, content categorization, and social network analysis can be included into the fraud detection and predictive modeling mechanism.
  • 4. Depending on the score of the particular network, an alert is generated. <br /> 5. The investigators can then leverage this information and begin researching more on the fraudulent claim. <br /> 6. Finally, issues of frauds that are identified are added into the case system.
  • Predictive Analysis <br /> Predictive analysis works with the concept that earlier the fraud detection, the lesser the loss incurred by a business.
  • Think about a situation where a customer raises a claim saying his car caught fire. But recorded statements indicate that most of the valuable items in the car had been removed prior to the fire. This could raise the suspicion that the car had been torched on purpose.
  • Predictive analytics includes the use of text analytics and sentiment analysis to look at Big Data for fraud detection. Claim reports are of multiple pages, leaving very little room for text analytics to detect the scam easily. Big Data analytics helps in sifting through unstructured data, which was not possible earlier, and helps in proactively detecting frauds. <br /> Predictive analytics technology is being used increasingly to spot potentially fraudulent claims and to speed up the payment of legitimate claims.
  • Here’s how the predictive analytics technology works: <br /> Claim adjusters write lengthy reports while investigating a claim. Typically clues are hidden in the reports that the claims adjuster would not have noticed <br /> The computing system that is based on business rules highlights these clues for possible fraud <br /> The fraud detection system can spot these discrepancies and flag the claim as fraudulent.
  • Social Customer Relationship Management (CRM) <br />   <br /> Social CRM enables effective fraud detection in the insurance sector. Social CRM is neither a platform nor a technology, but a process. It makes it critical that insurance companies link social media sites, such as Facebook and Twitter, to their CRM systems.
  • When social media is integrated within an organization, it enables greater transparency with customers. Mutually beneficial transparency indicates that the company trusts its customers and vice-versa. This customer-centric ecosystem reinforces that increasingly the customer base is in control. This ecosystem can be beneficial to the business as well if the business is able to leverage the collective intelligence of its customer base.
  • Here’s how the predictive analytics technology works: <br /> Claim adjusters write lengthy reports while investigating a claim. Typically clues are hidden in the reports that the claims adjuster would not have noticed. <br /> The computing system that is based on business rules highlights these clues for possible fraud. <br /> The fraud detection system can spot these discrepancies and flag the claim as fraudulent.
  • Today we will discuss the usage of Big Data in the retail industry.
  • Big Data has huge potential for the retail industry as well, considering the immense number of transactions and the correlation.
  • Seemingly simple questions are easy to answer when there is a single retail location and a small customer base: How many basic tees did we sell today? <br /> What time of the year do we sell most leggings? What else has customer X bought, and what kind of coupons can we send to customer X?
  • But in larger systems, with millions of transactions being carried out daily, spread across multiple disconnected legacy systems and IT teams, it is impossible to see the full picture of the data.
  •  Finding the link in the company’s sales, between in-store and online sales, can lead to deep insights into customer behavior and overall company health, but often this information is so hard to pull together that the issue goes unaddressed. Retail stores typically run on the legacy point of sale systems that batch updates daily, and often do not communicate with each other, let alone with the e-commerce site. <br /> For a marketing analyst, to try and understand the strength and health of their products or campaign, reconciling these systems and their different data can be an impossible task. While omni-channel retailing solutions do exist, they require both store managers and Web developers to learn entirely new systems, incurring huge costs in time and money for company-wide training and systems deployment. Further, accessing data in real time is not often possible, as systems hit scaling issues. <br /> <br /> <br />
  • Suppose, you want to know if a particular item is in stock in another nearby store. This information is eventually not readily available and requires phone calls or other communication that adds further time to a transaction and potentially prevents an immediate sale from being made.
  • As retail gets bigger and wider with technology in the likes of Walmart and Amazon, tracking shipping and production also grows significantly. In these scenarios, Big Data proves to be of immense help. Data from innovative solutions like tagging are used for analysis. These tags can generate a lot of data, that can be analyzed to provide various solutions, some of which are discussed in the next section.
  • But remember, the fact remains that most of the Big Data is just not required and not useful either. Within a Big Data feed, some information will have long-term strategic value, some will be useful only for immediate and tactical use, and some data won’t be used for anything at all. The key part of taming Big Data is to determine which pieces fall into which category.
  • Use of RFID (Radio Frequency Identification) Data in Retail <br />   <br /> A RFID tag refers to a small tag that includes a unique code to identify a product like a UPC code. This tag is placed on shipping pallets or product packages as an adjacent image.
  • In addition to a bar code, an RFID: <br />   <br /> Specifies the pallet as allotted to a precise and exclusive set of computer systems. <br /> Helps in finding situations where the items have no units left in store. <br /> Specifies the number of units of each item remaining in the store, and thereby raises an alarm when restocking is required. <br /> Allows better tracking of products by differentiating the products which are out of stock and products that are available on shelf. For example, if a product is unavailable on the shelf, that does not mean that it is not available throughout. Using a RFID reader and a mobile computer—stocks can be identified from the warehouse and replaced immediately.
  • In addition to these, use of RFID also saves time, reduces labor, enhances the visibility of products throughout the production-delivery life cycle, and saves costs.
  • Under this topic we have discussed in detail about <br /> The uses of Big Data in retail industry. <br />
  • Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts. <br />  
  • In today’s topic we will look at the various technologies used for handling Big Data.
  • Today we will further discuss how to make use of the enormous volume and variety of data at the required speed, with a suitable technology framework. So we will look at some of the major technologies related to Big Data that help store, process, and analyse the data and provide required business insights.
  • Rapid changes in technology radically changes the way data is produced, processed, analysed, and consumed. A huge increase in the amount of data being captured and analysed by organizations as well as that on the Internet, has fuelled the need for huge data sources and efficient processing of that data.
  • Some of the most popular areas of Big Data-related innovation include those in distributed and parallel computing, Hadoop, cloud for Big Data, and in-memory computing for Big Data. <br /> Of all the technologies, Hadoop is perhaps the most popular name identified with Big Data.
  • Distributed computing is a method in which multiple computing resources are connected in a network and computing tasks are distributed across the resources, thereby increasing the computing power. Distributed computing is faster and more efficient than traditional computing, and hence of immense value when it comes to processing a huge amount of data in a limited time.
  • Parallel computing is a process where to carry out complex computations, the processing power of a standalone personal computer can also be enhanced by adding multiple processing units. These can carry out the processing of a complex task by breaking it up into subtasks, and carrying out individual sub-tasks simultaneously. <br />
  • Today markets and businesses are fiercely competitive. At the same time, the volume, variety, and velocity of data available has surged astronomically. To find an edge in the market, organizations feel a need for analysing all the data they can get hold of, and in a very short span of time. This obviously leads to the requirement of large storage and processing powe
  • In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  • In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  • In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  • In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  • In spite of all the technological developments there is a constant problem affecting the process of data collection – latency. Latency is the aggregate delay in the system because of delays in individual tasks involving large amount of data. If you use a wireless phone, you may have experienced latency first-hand in the form of a lag or delay in the transmissions between you and your caller.
  • This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses. <br />
  • This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses. <br />
  • This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses. <br />
  • This delay leads to slowdown in system performance, data management, and communication within an organization as well as with customers and other external stakeholders. Regular Big Data applications usually suffer from the problem of latency, and hence have lower levels of performance. This is a potential problem for businesses. <br />
  • As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
  • As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
  • As a response to handling all these problems, distributed and parallel processing techniques provided concrete solutions for not just processing large amounts of data in a short span, but also in dealing with the latency.
  • A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
  • A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
  • A collection of independent computer systems that are connected via a network to accomplish a specific task. The connected computers are loosely coupled and can access data and resources that are remotely located.
  • A computer system that has multiple processing units attached to it. These systems are tightly coupled and are usually employed to solve a single complex problem.
  • Several servers are connected to form a network, so that the workload can be shared amongst them. A cluster equipped with the same type of commodity hardware is called homogeneous cluster. A cluster equipped with a combination of different hardware is called heterogeneous cluster. <br /> An organization can utilize the hardware components acquired over a period of time, to form a cluster or grid. This method is usually cost-effective. Also, grids offer cost-effective storage solutions, although the overall costs may be high.
  • An MPP platform is a single machine that works like a grid. It handles storage, memory, and computing tasks. These capabilities are optimized by software written especially for the MPP platform. The platform is also optimized for scalability. <br /> MPP platforms are suitable for high value uses. EMC Greenplum and ParAccel are examples of MPP platforms.
  • HPC environments offer very high performance and scalability. They use in-memory technology and are used for high-speed floating point processing. You will read more about in-memory technology in the following sections. <br /> HPC environments are ideal for specialty applications and custom application development. These environments are suitable for research or business organizations where high costs are acceptable because the results are very valuable, or the project is strategically important. <br />
  • A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. <br />   <br /> In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units. <br />
  • A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. <br />   <br /> In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units. <br />
  • A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. <br />   <br /> In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units. <br />
  • A public cloud environment is a type of cluster or grid that can be accessed through the internet. The cloud owner or vendor develops a cluster, and then allows customers to use it for storage or computing tasks for a fee. Amazon and EC2 are examples of public clouds. A public cloud gives businesses the flexibility to buy computing power as per their needs. <br />   <br /> In case of a private cloud environment, an organization’s cluster is private and accessible through its network. The private cluster is suitable for businesses that have high priority for data privacy. The cost of private clouds is shared among the business units. <br />
  • In this session we will study Hadoop in detail, one of the most preferred technologies to handle Big Data.
  • Hadoop is an open-source platform designed to work with huge volumes of structured and unstructured data—Big Data. Working with such volume of data needs deep analytical technology, which requires greater computational power. <br />
  • Lets look at some of the features of Hadoop: <br />   <br /> It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. <br /> Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. <br /> Access improves because Hadoop can track the data stored on the different servers. <br /> Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. <br /> Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure. <br />
  • Lets look at some of the features of Hadoop: <br />   <br /> It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. <br /> Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. <br /> Access improves because Hadoop can track the data stored on the different servers. <br /> Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. <br /> Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure. <br />
  • Lets look at some of the features of Hadoop: <br />   <br /> It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. <br /> Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. <br /> Access improves because Hadoop can track the data stored on the different servers. <br /> Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. <br /> Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure. <br />
  • Lets look at some of the features of Hadoop: <br />   <br /> It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. <br /> Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. <br /> Access improves because Hadoop can track the data stored on the different servers. <br /> Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. <br /> Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure. <br />
  • Lets look at some of the features of Hadoop: <br />   <br /> It can work on a large number of machines that do not share any memory or disks. This solves the twin Big Data problems of efficient storage and access. <br /> Storage improves when the data is loaded on a Hadoop platform because Hadoop distributes data over different servers. <br /> Access improves because Hadoop can track the data stored on the different servers. <br /> Processing improves as Hadoop runs computing tasks by using all available processors working in parallel. <br /> Hadoop improves resilience by keeping multiple copies of data, which can be used in case of server failure. <br />
  • So how does Hadoop use multiple computing resources to execute a task? <br />   <br /> The Hadoop Distributed File System (HDFS) is a reliable, high-bandwidth, low-cost data storage cluster that facilitates management of related files across machines. <br /> The Hadoop MapReduce Engine is a high-performance parallel/distributed data-processing implementation of the MapReduce algorithm.
  • So how does Hadoop use multiple computing resources to execute a task? <br />   <br /> The Hadoop Distributed File System (HDFS) is a reliable, high-bandwidth, low-cost data storage cluster that facilitates management of related files across machines. <br /> The Hadoop MapReduce Engine is a high-performance parallel/distributed data-processing implementation of the MapReduce algorithm.
  • Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption. <br />
  • Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption. <br />
  • Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption. <br />
  • Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption. <br />
  • Hadoop is designed to process large amounts of structured and unstructured data and is implemented in racks of commodity servers as a Hadoop cluster. Each server works independently at its task and returns its response. Servers can also be removed or added from the cluster dynamically because Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption. <br />
  • MapReduce is the programming model which allows mapping the tasks to different servers and reducing the responses to one result. <br /> Hadoop MapReduce is an implementation of the MapReduce algorithm developed and maintained by the Apache project. This algorithm provides the capabilities to break data into manageable chunks, process the data in parallel on the distributed cluster, and then make the data available for user consumption or additional processing. <br />
  • The map component of MapReduce distributes the programming problem or tasks across a large number of systems, and handles the placement of the tasks in a way that balances the load and manages recovery from failures. After the distributed computation is completed, another function called Reduce, aggregates all the elements back together to provide a result.
  • When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  • When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  • When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  • When Hadoop receives an indexing job, the data of an organization is first loaded into the Hadoop software, which then divides the data into different pieces and sends each piece of data to different servers. Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data. Each server then applies the job code to the portion of data stored on it and returns the results.
  • This chart describes the process of job tracking in MapReduce. <br />
  • Lets look at an example to understand how Hadoop works. <br /> Consider the records of all telephone calls in a city. Suppose, a researcher wants to know the number of college students who made calls at the time of a particular event. The indexing query would specify the relevant user information and the time of the event. Each server would search its collection of call records and return the ones that match the query. Hadoop would put together all these sets into one result. Lets suppose, all records of telephone calls are stored in the csv format in the server. First, the data is loaded in Hadoop and then the MapReduce programming model is used to process the data. <br /> Suppose there are five columns in the csv file: <br /> user_id <br /> user_name <br /> city_name <br /> service_provider_name <br /> and call_time <br />
  • Lets look at an example to understand how Hadoop works. <br /> Consider the records of all telephone calls in a city. Suppose, a researcher wants to know the number of college students who made calls at the time of a particular event. The indexing query would specify the relevant user information and the time of the event. Each server would search its collection of call records and return the ones that match the query. Hadoop would put together all these sets into one result. Lets suppose, all records of telephone calls are stored in the csv format in the server. First, the data is loaded in Hadoop and then the MapReduce programming model is used to process the data. <br /> Suppose there are five columns in the csv file: <br /> user_id <br /> user_name <br /> city_name <br /> service_provider_name <br /> and call_time <br />
  • To find the number of users or students who made calls at a particular time, a student is identified by the user_id. <br /> The final output is the total number of users who made calls during a particular time period, say, 9–10 pm. To get the final output, the data is passed line by line to each mapper. After completion of the mapper job, the Hadoop framework shuffles or sorts and groups the data and sends it to the reducer, which gives the final output. The Hadoop platform also facilitates data storage on many machines. This facility allows a business to use multiple commodity servers and run Hadoop on each, instead of creating an integrated system. <br />
  • Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts. <br />  
  • This topic deals with the various technologies used for handling Big Data.
  • In this session we will understand cloud computing & various in-memory technologies for handling Big Data.
  • Cloud-based application platforms enable easy availability of computing resources to an application, and lets you pay for these resources accordingly, depending on what and how much you use. In the context of cloud computing, such a feature is called elasticity—you can regulate and access the computing resources dynamically with a touch of a button and pay. <br />
  • In cloud computing, all data is gathered in data centers and then distributed to the end-users. Further, automatic backups and recovery of data is also ensured for business continuity. The primary reason Cloud and Big Data analytics complement each other is because Cloud, like Big Data, uses distributed computing as well.
  • Amazon & Google are two large companies who are required to have massive capability to manage huge amounts of data to move their business. They need infrastructure and technologies that can support their applications at a huge scale. Think of the millions of g-mail messages that Google needs to process every minute, or every second as a part of this job. Google has been able to optimize the Linux OS and its software environment to support e-mails efficiently. Its able to capture and leverage massive amounts of data about its mail users and search engine users to drive its business. Similarly, Amazon with its IaaS data centres is optimized to facilitate massive workloads to offer services and support to innumerable centers. Both these companies now offer a range of cloud-based services for Big Data as well. <br />
  • Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. <br />   <br /> Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. <br />   <br /> Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. <br />   <br /> Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. <br />   <br /> Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. <br />   <br /> Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service. <br />
  • Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. <br />   <br /> Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. <br />   <br /> Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. <br />   <br /> Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. <br />   <br /> Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. <br />   <br /> Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service. <br />
  • Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. <br />   <br /> Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. <br />   <br /> Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. <br />   <br /> Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. <br />   <br /> Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. <br />   <br /> Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service. <br />
  • Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. <br />   <br /> Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. <br />   <br /> Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. <br />   <br /> Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. <br />   <br /> Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. <br />   <br /> Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service. <br />
  • Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. <br />   <br /> Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. <br />   <br /> Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. <br />   <br /> Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. <br />   <br /> Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. <br />   <br /> Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service. <br />
  • Scalability - Even if organizations increase the processing power of their hardware, they may need to change the architecture, and may face issues with running the software on the new hardware. Cloud provides the solution to this. It provides scalability by using distributed computing. <br />   <br /> Elasticity - Cloud solutions allow customers to use and pay for the exact amount of cloud service they need, depending on the requirement; for example, a business may expect more data during an in-store promotion, and hence can buy more processing power during such times. <br />   <br /> Resource Pooling - Multiple organizations using similar computing resources do not need to invest in them individually. Cloud can offer these resources, and as these resources are used by many, the cost to the cloud comes down. <br />   <br /> Self-service - Customers can access cloud services directly through a user interface that allows them to choose the services they want. This is automated and does not need human intervention. <br />   <br /> Low Costs - Businesses do not need to make a large initial investment in computing resources to handle huge operations such as Big Data analytics. They can sign up for a cloud service and pay as they use. In the process, the cloud provider enjoys economies of scale. This benefits the customers. <br />   <br /> Fault Tolerance - If a part of the cloud fails, the other parts can take over and give customers uninterrupted service. <br />
  • A public cloud is owned and operated by an organization, for use by other organizations and individuals. A public cloud offers a range of computing services. For each category of service, it specializes in a specific type of workload. By specializing, the cloud can customize hardware and software to optimize performance. Customization makes the computing process highly scalable; for example, a cloud can specialize in storing videos for live streaming on YouTube or Vimeo and optimize to handle a large volume of traffic. <br /> For businesses, public cloud provides economical storage solutions and is an efficient way to handle complex data analysis. However these factors sometimes increases the risk of security & latency. <br />
  • A private cloud is owned and operated by an organization for its own purposes. Besides the employees, partners and customers of the organization also use the private cloud. Private cloud is designed for one organization, and incorporates the systems and processes of that organization, including the organization’s business rules, governance policies, and compliance checks. Things that need to be done manually in the public cloud because of different specifications given by multiple customers, can be automated in the private cloud. This cloud is thus highly automated and also protected by a firewall. This reduces latency and improves security, making it ideal for Big Data analytics. <br />
  • Apart from being used for Big Data analytics, the Cloud is used for several purposes such as storage, backup, and customer services. As more people use computers on the go, business tasks have shifted to laptops and mobile devices and subsequently to the cloud. Consumers may order a product from their home, and the store receives the order and sends instructions to the warehouse, which delivers the product. The store could be using the cloud to receive the order and send instructions, as well as to handle payments and track deliveries. These tasks can also be done without using cloud computing, but cloud computing lowers infrastructure costs and provides scalable content storage.
  • Infrastructure as a service <br /> Infrastructure refers to hardware, storage, and network. When you pay to save your holiday photographs on a cloud, you use a public IaaS. When an employee saves a work report on the organization’s backup server, the employee uses a private IaaS. IaaS provides hardware, storage, and network as a service. Examples of IaaS are virtual machines, load balancers, and network-attached storage. A business can save investments in physical infrastructure by using a public cloud IaaS. The business can choose the OS, and IaaS allows the business to create virtual machines with scalable storage and processing power. <br />
  • Platform as a service <br /> PaaS provides a platform to write and run users applications. The Platform refers to the OS, which is a collection of middleware services and software development and deployment tools. Examples of PaaS are Windows Azure and Google App Engine or GAE. When an organization has a private cloud PaaS, programmers in the business unit can create and deploy applications for their needs. PaaS makes it easier to experiment with new applications. <br />
  • Software as a service <br /> SaaS provides software that can be accessed from anywhere. Customers can use software on the cloud without buying and installing it on their own devices. These software applications are offered on monthly or yearly contracts. For SaaS to work, the infrastructure (IaaS) and the platform (PaaS) must be in place. <br /> An organization can maintain a custom-developed application in its private cloud and link it to Big Data stored in a public cloud. In a hybrid cloud, the application can efficiently analyze the data by using the strengths of private and public clouds. <br />
  • Among the many established and new cloud service providers, some offer resources specifically for Big Data analytics. Lets look at a few of these: <br />   <br /> Amazon - The development of Amazon’s IaaS, called Elastic Compute Cloud (Amazon EC2) was a result of the company’s massive infrastructure of computing resources for its own business, which were actually underused. So, Amazon decided to rent them out and earn revenues. The word “elastic” in the name is justified because these resources can be scaled hour by hour.
  • In addition to Amazon EC2, Amazon Web Services offers the following services: <br /> Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). <br />   <br /> Amazon DynamoDB which <br /> Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. <br />   <br /> Amazon Simple Storage Service (S3) <br /> It refers to a Web interface for storing data over the Internet and for Web-scale computing. <br />   <br /> Amazon High-Performance Computing <br /> It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. <br />   <br /> Amazon RedShift - <br /> Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  • In addition to Amazon EC2, Amazon Web Services offers the following services: <br /> Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). <br />   <br /> Amazon DynamoDB which <br /> Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. <br />   <br /> Amazon Simple Storage Service (S3) <br /> It refers to a Web interface for storing data over the Internet and for Web-scale computing. <br />   <br /> Amazon High-Performance Computing <br /> It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. <br />   <br /> Amazon RedShift - <br /> Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  • In addition to Amazon EC2, Amazon Web Services offers the following services: <br /> Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). <br />   <br /> Amazon DynamoDB which <br /> Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. <br />   <br /> Amazon Simple Storage Service (S3) <br /> It refers to a Web interface for storing data over the Internet and for Web-scale computing. <br />   <br /> Amazon High-Performance Computing <br /> It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. <br />   <br /> Amazon RedShift - <br /> Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  • In addition to Amazon EC2, Amazon Web Services offers the following services: <br /> Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). <br />   <br /> Amazon DynamoDB which <br /> Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. <br />   <br /> Amazon Simple Storage Service (S3) <br /> It refers to a Web interface for storing data over the Internet and for Web-scale computing. <br />   <br /> Amazon High-Performance Computing <br /> It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. <br />   <br /> Amazon RedShift - <br /> Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  • In addition to Amazon EC2, Amazon Web Services offers the following services: <br /> Amazon Elastic MapReduce which Refers to a Web service that provides cost-effective processing of vast amounts of data by using Amazon EC2 and Amazon Simple Storage Service (Amazon S3). <br />   <br /> Amazon DynamoDB which <br /> Refers to a NoSQL database service that stores data items on Solid State Drives (SSDs) and replicates data with high availability and durability. <br />   <br /> Amazon Simple Storage Service (S3) <br /> It refers to a Web interface for storing data over the Internet and for Web-scale computing. <br />   <br /> Amazon High-Performance Computing <br /> It Refers to a low-latency network with high bandwidths and computational capabilities for solving problems from educational and business fields. <br />   <br /> Amazon RedShift - <br /> Which refers to a petabyte-scale data warehouse service for analysing data by using existing business intelligence tools in a cost-effective manner.
  • Now lets look closer at what Google has to offer in terms of services designed for Big Data: <br /> Google Compute Engine: It is a secure and flexible virtual machine computing environment. <br />   <br /> Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format. <br />   <br /> Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use. <br />
  • Now lets look closer at what Google has to offer in terms of services designed for Big Data: <br /> Google Compute Engine: It is a secure and flexible virtual machine computing environment. <br />   <br /> Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format. <br />   <br /> Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use. <br />
  • Now lets look closer at what Google has to offer in terms of services designed for Big Data: <br /> Google Compute Engine: It is a secure and flexible virtual machine computing environment. <br />   <br /> Google BigQuery: It is a Desktop as a Service (DaaS) that searches large datasets at high speeds on the basis of queries in the SQL format. <br />   <br /> Google Prediction API: It identifies patterns in data, stores the patterns, and improves the pattern with every use. <br />
  • And next, lets see what Windows Azure is all about. <br /> On the basis of Windows and SQL abstractions, Microsoft has produced a set of development tools, virtual machine support, management and media services, and mobile device services in a PaaS offering. For customers with deep expertise in .NET, SQL Server, and Windows, the adoption of the Azure-based PaaS is straightforward. To address the emerging requirements to integrate Big Data into Windows Azure solutions, Microsoft has also added Windows Azure HDInsight. Built on the Hortonworks Data Platform (HDP), which according to Microsoft, offers 100 percent compatibility with Apache Hadoop, HDInsight supports connection with Microsoft Excel and other Business Intelligence tools. In addition, Azure HDInsight can also be deployed on the Windows Server.
  • Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  • Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  • Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  • Large organizations store data in a central warehouse, and all users have to access it from there, usually through the IT department. In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally. This reduces the workload on the central warehouse. Users do not need the IT department to work with the data
  • In this session we discussed cloud computing & <br /> various in-memory technologies for handling Big Data. <br />

Big data gaurav Big data gaurav Presentation Transcript

  • BUMPER
  • Understanding Big Data Class 1 Introduction to Big Data
  • Understanding Big Data Business Applications of Big Data Class 1 Introduction to Big Data View slide
  • Understanding Big Data Business Applications of Big Data Technologies for handling Big Data Class 1 Introduction to Big Data View slide
  • Understanding Big Data Business Applications of Big Data Technologies for handling Big Data Big Data Management Systems – Databases & Warehouses Class 1 Introduction to Big Data
  • Understanding Big Data Business Applications of Big Data Technologies for handling Big Data Big Data Management Systems – Databases & Warehouses Analytics & Big Data Class 1 Introduction to Big Data
  • Topic 1 Class 1 Introduction to Big Data Understanding Big Data
  • What is Big Data? Topic 1 – Understanding Big Data
  • What is Big Data? Topic 1 – Understanding Big Data Structuring & Elements
  • What is Big Data? Topic 1 – Understanding Big Data Structuring & Elements Application in Business & Careers
  • DATA Personal Computers Facebook Twitter YouTube Google ATMs Drop Box Picasa
  • 2002 5 Exabytes Online Data 2009 281 Exabytes Online Data (56 Times Increase)
  • A pool of large-sized datasets to capture, store, What is Big Data?
  • A pool of large-sized datasets to capture, store, What is Big Data? search, share, transfer, analyse, and visualise
  • A pool of large-sized datasets to capture, store, What is Big Data? search, share, transfer, analyse, and visualise related information or data within an acceptable elapsed time.
  • Data = Information
  • Data = Information Information = Insight
  • • Every second, consumers make 10,000 payment card transactions worldwide
  • • Every second, consumers make 10,000 payment card transactions worldwide • Every hour, Walmart handles more than 1 million customer transactions
  • • Every second, consumers make 10,000 payment card transactions worldwide • Every hour, Walmart handles more than 1 million customer transactions • Everyday Twitter’s users post 500 million tweets per day
  • • Every second, consumers make 10,000 payment card transactions worldwide • Every hour, Walmart handles more than 1 million customer transactions • Everyday Twitter’s users post 500 million tweets per day • Facebook users post 2.7 billion likes and comments in a day
  • BIG DATA Is a new data challenge that requires leveraging existing systems differently
  • BIG DATA Is a new data challenge that requires leveraging existing systems differently Is classified in terms of: Volume (terabytes, records, transactions) Variety (internal, external, behavioural, or/and social) Velocity (near or real-time assimilation)
  • BIG DATA Is a new data challenge that requires leveraging existing systems differently Is classified in terms of: Volume (terabytes, records, transactions) Variety (internal, external, behavioural, or/and social) Velocity (near or real-time assimilation) Is usually unstructured and qualitative in Nature
  • • Understanding target customer Advantages of Studying Big Data:
  • • Understanding target customer • Cutting down expenditures in the healthcare Advantages of Studying Big Data:
  • • Understanding target customer • Cutting down expenditures in the healthcare • Increase in operating margins in retail Advantages of Studying Big Data:
  • • Understanding target customer • Cutting down expenditures in the healthcare • Increase in operating margins in retail • Profits with improvements in operational efficiency Advantages of Studying Big Data:
  • • Sports Industries that Benefit:
  • • Sports • Science and Research Industries that Benefit:
  • • Sports • Science and Research • Security and Law Enforcement Industries that Benefit:
  • • Sports • Science and Research • Security and Law Enforcement • Financial Trading Industries that Benefit:
  • • Procurement Departments that can Benefit:
  • • Procurement • Product Development Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing • Distribution Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing • Distribution • Marketing Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising • Sales Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising • Sales • Store operations Departments that can Benefit:
  • • Procurement • Product Development • Manufacturing • Distribution • Marketing • Price Management • Merchandising • Sales • Store operations • Human Resources Departments that can Benefit:
  • Flu Indications & WarningsMassive Data Collection Analyse Collected Data Early Warnings for Flu Plague
  • Social Data from Networking Sites reveals Behavioural Patterns
  • Use Big Data for Growth & Value Addition
  • RECAP What is Big Data, its advantages and various sources
  • BUMPER
  • BUMPER
  • Topic 1 Class 1 - Introduction to Big Data Understanding Big Data
  • What is Big Data? Class 1 - Introduction to Big Data
  • What is Big Data? Class 1 - Introduction to Big Data Structuring & Elements
  • What is Big Data? Class 1 - Introduction to Big Data Structuring & Elements Application in Business & Careers
  • How do I choose a book, of the millions available on my favorite sites or stores? How can I use the vast amount of data and information I come across?
  • How do I keep myself updated of events, news? Which news articles should I read? How do I choose a book, of the millions available on my favorite sites or stores? How can I use the vast amount of data and information I come across?
  • Formats of Data:
  • Formats of Data:
  • Formats of Data:
  • Formats of Data:
  • Internal – Organisational or enterprise data Sources of Data: External - Social Data from the internet or Government
  • Structured Data Unstructured Data Semi- Structured Data BIG DATA
  • Structured Data
  • • Has a predefined format Features of Structured Data:
  • • Has a predefined format • Resides in fixed fields within a record Features of Structured Data:
  • • Has a predefined format • Resides in fixed fields within a record • Has their attributes mapped Features of Structured Data:
  • • Has a predefined format • Resides in fixed fields within a record • Has their attributes mapped • Used to report against predetermined data types Features of Structured Data:
  • Sources of Structured Data: • Relational databases
  • Sources of Structured Data: • Relational databases • Flat files in record format
  • Sources of Structured Data: • Relational databases • Flat files in record format • Multidimensional databases
  • Sources of Structured Data: • Relational databases • Flat files in record format • Multidimensional databases • Legacy databases
  • Unstructured Data
  • Sources of Unstructured Data: • Organisational Data
  • Sources of Unstructured Data: • Organisational Data • Social Media
  • Sources of Unstructured Data: • Organisational Data • Social Media • Mobile Data
  • Challenges of Using Unstructured Data: • Difficulty and time consumption in making sense
  • Challenges of Using Unstructured Data: • Difficulty and time consumption in making sense • Difficulty in combining and linking unstructured data to more structured information
  • Challenges of Using Unstructured Data: • Difficulty and time consumption in making sense • Difficulty in combining and linking unstructured data to more structured information • Cost-addition in terms of the storage wastage and human resource needed
  • Semi-Structured Data
  • Sources of Semi-Structured data: • Database systems
  • Sources of Semi-Structured data: • Database systems • File systems like Web data and bibliographic data
  • Sources of Semi-Structured data: • Database systems • File systems like Web data and bibliographic data • Data exchange formats like scientific data
  • Sl. No Name E-mail 1. Sam Jacobs smj@xyz.com 2. First Name David davidb@xyz.com Last Name Brown
  • Volume
  • Velocity
  • Variety
  • What is Big Data? Class 1 - Introduction to Big Data Structuring & Elements Application in Business & Careers
  • Big Data Application In Business Analytics
  • What are the areas where Big Data can be applied?
  • Transportation Provides improved traffic information and autonomous features
  • Education Through innovative approaches for teachers to analyze students
  • Travel Apply analytics to pricing, inventory, and advertising to improve customer experiences
  • Governments To make informed decisions for fraud management, discover unknown threats, ensure security of global supply chain
  • Healthcare To ensure clinical protocols that will ensure the best health outcome for patients
  • Careers in Big Data
  • BIG Career Opportunities
  • Major Big Data Hiring Companies: Product companies, e.g., Oracle Technology drivers, e.g., Google Services companies, e.g., EMC Data analytics companies, e.g., Splunk
  • The most common job titles in Big Data include: Big Data Analyst
  • The most common job titles in Big Data include: Big Data Analyst Big Data Scientist
  • The most common job titles in Big Data include: Big Data Analyst Big Data Scientist Big Data Developer
  • Module 1 Introduction to Big Data
  • Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track
  • Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem
  • Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem Module 5 Machine Learning Concepts Module 3 Storing & Processing Data: HDFS & MapReduce Module 4: Increasing Efficiency with Hadoop Tools Module 5 Additional Hadoop Tools: ZooKeeper, Sqoop, Flume, YARN & Storm
  • Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem Module 5 Machine Learning Concepts Module 3 Storing & Processing Data: HDFS & MapReduce Module 4: Increasing Efficiency with Hadoop Tools Module 5 Additional Hadoop Tools: ZooKeeper, Sqoop, Flume, YARN & Storm Module 6 Social Media, Mobile Analytics & Visualisation Module 7 Industry Applications of Big Data Applications Module 6 Leveraging NoSQL & Hadoop: Real Time, Security & Cloud Module 7 Commercial Hadoop Distribution & Management Tools
  • Module 1 Introduction to Big Data Big Data Analyst Certification Track Big Data Developer Certification Track Module 2 Introduction to Analytics & R Programming Module 3 Data Analysis Using R Module 4 Advanced Analytics Using R Module 2 Managing a Big Data Ecosystem Module 5 Machine Learning Concepts Module 3 Storing & Processing Data: HDFS & MapReduce Module 4: Increasing Efficiency with Hadoop Tools Module 5 Additional Hadoop Tools: ZooKeeper, Sqoop, Flume, YARN & Storm Module 6 Social Media, Mobile Analytics & Visualisation Module 7 Industry Applications of Big Data Applications Module 6 Leveraging NoSQL & Hadoop: Real Time, Security & Cloud Module 7 Commercial Hadoop Distribution & Management Tools Complete Project Wrox Certified Big Data Analyst/ Developer
  • Technical Skills Required for a Big Data Analyst:
  • Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce
  • Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive
  • Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig
  • Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau
  • Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau • Statistical techniques to implement text analytics solutions
  • Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau • Statistical techniques to implement text analytics solutions • Data handling and manipulation techniques
  • Technical Skills Required for a Big Data Analyst: • Handle & analyse massive data sets using MapReduce • Hadoop & components Hbase & Hive • SQL and NoSQL languages such as Impala, Hive and Pig • Analytical tools such as SAS, R, Tableau • Statistical techniques to implement text analytics solutions • Data handling and manipulation techniques • Generate client ready dashboards, reports and visualizations
  • Soft Skills Required: • Strong written & verbal communication skills
  • Soft Skills Required: • Strong written & verbal communication skills • Analytical Ability
  • Soft Skills Required: • Strong written & verbal communication skills • Analytical Ability • Basic understanding of how a business works
  • Future of Big Data
  • RECAP  What are the various types and structures of Big Data and the elements that form it  What are the business applications of Big Data and the career opportunities associated
  • BUMPER
  • BUMPER
  • BIG DATA
  • Topic 2 Business Applications of Big Data Class 1: Introduction to Big Data
  • Social Media
  • Topic 2 Business Applications of Big Data Significance of Social Network Data
  • Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data
  • Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data Fraud Detection in Insurance
  • Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data Fraud Detection in Insurance Use in Retail Industry
  • Significance of Social Network Data What is Social Network Data?
  • Significance of Social Network Data What is Social Network Data? What is Social Network Analysis?
  • Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis?
  • Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?
  • DATA
  • Social Media AGE
  • Social Media AGE GENDER
  • Social Media AGE GENDER LOCATION
  • Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?
  • Social Network Analysis (SNA) Social Network
  • Social Network Analysis (SNA) Social Network DATA
  • Analysis Social Network Analysis (SNA) Social Network DATA
  • Total Number of calls
  • Total Number of calls Total Number of SMS
  • Structure of a Caller’s Social Network
  • Social Network Site
  • Social Network Site
  • Social Network Site
  • Social Network Site
  • Social Network Site
  • Social Network Site
  • Social Network Site
  • Social Network Site
  • Social Network Site
  • Social Networking Analysis a Big Data Problem
  • Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?
  • Social Network Analysis (SNA) Business Intelligence
  • Social Network Analysis (SNA) Business Intelligence Marketing
  • Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development
  • Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development
  • Customer Relationship Management (CRM)
  • A •E •F B •A •D C •H •OGroup A Group GH
  • Provides new contexts in which decisions are data driven, not opinion driven Social Network Data Analysis
  • Provides new contexts in which decisions are data driven, not opinion driven Organizations to shift goals to maximize profitability of customer’s network Social Network Data Analysis
  • Provides new contexts in which decisions are data driven, not opinion driven Organizations to shift goals to maximize profitability of customer’s network Organizations to identify highly connected customers Social Network Data Analysis
  • Organizations to lure highly connected customers with free trials and solicit their feedback Social Network Data Analysis
  • Organizations to lure highly connected customers with free trials and solicit their feedback Organizations to encourage internal customers to become more active Social Network Data Analysis
  • Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development
  • Social Data
  • Social Data Analysis
  • Analyze Media Communication
  • Social Network Analysis (SNA) Business Intelligence Marketing Product Design & Development
  • System
  • System
  • DATA System
  • Significance of Social Network Data What is Social Network Data? What is Social Network Analysis? What are the uses of Social Network Data Analysis? What is Sentiment Analysis?
  • Product Development and Offerings
  • Sentiment Analysis Marketers Business Professionals
  • Followers
  • 3,46,259 Followers 2,73,591 Likes But is one of the most disliked airlines. Why?
  • SummaryRECAP What is social network data and analysis What are its uses and values
  • BUMPER
  • BUMPER
  • BIG DATA
  • Topic 2 Business Applications of Big Data Class 1: Introduction to Big Data
  • Topic 2 Business Applications of Big Data Significance of Social Network Data Financial Fraud & Big Data Fraud Detection in Insurance Use in Retail Industry
  • BANK
  • Common Financial Frauds Common Financial Frauds Credit Card Frauds Exchange or Return Policy Fraud Personal Information Fraud
  • understand customers ordering patterns Prevent Frauds watch out For red flags
  • Big Data
  • Analyzing data sample size Small Can understand various patterns of the fraud Analyzing data sample size Large Cannot understand various patterns of the fraud • Size could not be increased, required huge investments in time and money • Big Data techniques can overcome this challenge
  • Big Data analytics can… Run check on all data to identify fraudulent ones Identify new ways of fraud and add to a set of fraud-prevention checks Doesn’t impede customers with unnecessary polices and governance structures
  • Fraud Detection in Real Time BIG DATA live transactions sources of data
  • BIG DATA Historical Data Indicate fraud patterns Checks to prevent real-time fraud
  • Real-time Analysis
  • BIG DATA Create comparisons Drawing Maps & Graphs Decisions and effective systems BLOCK FRAUD
  • Topic 2 Business Applications of Big Data The Significance of Social Network Data Financial Fraud and Big Data Fraud Detection in Insurance Use of Big Data in the Retail Industry
  • Insurance Company Improve its ability to make decisions in real time when processing a new claim, thereby reducing the claim cycle time Incurs a steady increase in the cost of litigation and fraudulent claims Underwriters do not have required data at the right time to make the necessary decisions, further delaying processing time
  • BIG DATA Social Media Data Note for underwriter
  • Social Media Triggers to identify Fraud These glaring discrepancies reflect FRAUD. In the claim - a customer might indicate that his or her car was destroyed in a flood Documentation from the social media feed shows that the car was actually in another city on the day the flood occurred.
  • Insurance Frauds Have a huge cost implication on organization Organizations prefer using Big Data analytics and other advanced technologies Positive impact on customers as losses are transferred as higher premiums to customers
  • Big Data analytics platform Organizations are now able to analyze complex information and accident scenarios in minutes rather than days or months INSURANCE
  •  Typically use small samples of data to analyze  Method relies on the previously recorded fraud cases  Every time a fraud based on new technique occurs, insurance companies have to bear the consequences and the losses for the first time  The traditional method of identifying frauds works in independent silos  It is not capable of handling various sources of information from different channels and different functions in an integrated way Fraud Detection Methods Statistical Models
  • Public Data Bank Statements Legal Judgments Criminal Records Medical Bills
  • Social Network Analysis (SNA) Big Data can be used to create visibility into blind spots for businesses SNA is an innovative and effective way to identify and detect frauds
  • SNA tool uses a mix of analytical methods • Statistical methods • Pattern analysis • Link analysis
  • When link analysis is used in fraud detection • Looks for clusters of data • How those data clusters are linked to other data clusters? • Public records are various data sources that can be integrated into a model • The insurer can rate claims
  • When link analysis is used in fraud detection If the rating is high It indicates that the claim is fraudulent • known bad address • a suspicious provider • the vehicle was involved in many accidents with multiple carriers.
  • How fast does data arrive?
  • How much of unrequired data is there when it arrives?
  • How deep should the analysis be before determining the best accurate results?
  • What type of user interface components need to be included on the SNA dashboard?
  • SNA method to detect fraud: Structured and unstructured data, from various sources fed into the ETL (Extract, Transform, and Load) tool This data is then transformed and loaded into data warehouse Analytics team uses information from various sources, scores risk of fraud and ranks likelihood of fraud Information used can come from varied sources - prior belief, previous relationship, number of rejected claims etc. Big Data technologies - text mining, sentiment analysis, content categorization, and social network analysis included into the fraud detection and predictive modeling mechanism.
  • SNA method to detect fraud: Depending on score of particular network, an alert is generated Investigators can leverage this information and begin researching more on fraudulent claim Issues of frauds identified are added into case system.
  • Predictive analysis works with the concept that earlier the fraud detection, the lesser the loss incurred by a business.
  • Fraud detection BIG DATA Text analytics Sentiment analysis Predictive analytics
  • Predictive Analytics Technology Claim adjusters write lengthy reports while investigating a claim. Clues are hidden in reports that claims adjuster would not notice Computing system based on business rules highlights clues for possible fraud Fraud detection system spot these discrepancies and flag claim as fraudulent
  • Customer Relationship Management (CRM)
  • The following briefly describes how a Social CRM process works: Uses organization’s existing CRM to gather data from various social media platforms Uses “listening” tool to extract data from social chatter that acts as reference data for existing data in organization’s CRM Reference data along with information stored in CRM fed into a case management system Case management system analyzes information on basis of organization’s business rules and sends response Response from claim management system on fraudulent claim is confirmed by investigators
  • Class 1: Introduction to Big Data The Significance of Social Network Data Financial Fraud and Big Data Fraud Detection in Insurance Use of Big Data in Retail Industry
  • Use of Big Data in Retail Industry BIG DATA
  • Use of Big Data in Retail Industry How many basic tees did we sell today? What time of the year do we sell most leggings? What else has customer X bought? what kind of coupons can we send to customer X?
  • Use of Big Data in Retail Industry
  • Use of Big Data in Retail Industry In-store Sales Online Sales
  • Use of Big Data in Retail Industry
  • Use of Big Data in Retail Industry
  • Most of the Big Data is just not required and not useful either • some information will have long-term strategic value • some will be useful only for immediate and tactical use • some data won’t be used for anything at all
  • Use of RFID Data in Retail (Radio Frequency Identification) A RFID tag refers to a small tag that includes a unique code to identify a product like a UPC code. This tag is placed on shipping pallets or product packages as an adjacent image.
  • In addition to a bar code, an RFID: Specifies pallet as allotted to a precise and exclusive set of computer systems Helps in finding situations where items have no units left in store Specifies number of units of each item remaining in store, and thereby raises an alarm when restocking required Better tracking of products by differentiating products which are out of stock and products that are available on shelf.
  • Use of RFID Data in Retail • saves time • reduces labor • enhances the visibility of products throughout the production-delivery life cycle • saves costs
  •  What is the significance of Social Data Network Data, Financial Fraud, Fraud Detection in Insurance and the uses of Big Data in Retail Industry  What are the uses of Big Data in retail Industry, RFID Data and its advantages RECAP
  • BUMPER
  • BUMPER
  • Topic 3 Class 1 - Introduction to Big Data Technologies for Handling Big Data
  • Distribution & Computing for Big Data Topic 3 – Technologies for Handling Big Data Introducing Hadoop Cloud Computing & In-Memory Technologies for Big Data
  • DATA PROCESSING Analysed
  • Distributed & Parallel Computing BIG DATA HADOOP CLOUD In-Memory Computing
  • Transmitter Receiver
  • Transmitter Receiver Hello?
  • Transmitter Receiver Hello?
  • Transmitter Receiver Hello? I can’t hear you…
  • Slowdown in system performance Issues caused by Latency:
  • Slowdown in system performance Data management Issues caused by Latency:
  • Slowdown in system performance Data management Internal organisational communication Issues caused by Latency:
  • Slowdown in system performance Data management Internal organisational communication External communication Issues caused by Latency:
  • Distributed and Parallel processing
  • Distributed and Parallel processing techniques process large amounts of
  • Distributed and Parallel processing techniques process large amounts of data and also deal with latency.
  • Distributed System A collection of independent computer systems
  • Distributed System A collection of independent computer systems that are connected via a network
  • Distributed System A collection of independent computer systems that are connected via a network to accomplish a specific task.
  • Parallel System A computer system that has multiple processing units attached to it.
  • Parallel Computing Techniques Clusters or Grids
  • Parallel Computing Techniques Massively Parallel Processing (MPP)
  • Parallel Computing Techniques High-Performance Computing (HPC)
  • Public Cloud vs Private Cloud
  • Public Cloud vs Private Cloud
  • Public Cloud vs Private Cloud
  • Public Cloud vs Private Cloud
  • Distribution & Computing for Big Data Topic 3 – Technologies for Handling Big Data Introducing Hadoop Cloud Computing & In-Memory Technologies for Big Data
  • Features of Hadoop: • Works on multiple machines without sharing memory
  • Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers
  • Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers • Can track data stored on different servers
  • Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers • Can track data stored on different servers • Runs all available servers in parallel
  • Features of Hadoop: • Works on multiple machines without sharing memory • Distributes data over different servers • Can track data stored on different servers • Runs all available servers in parallel • Keeps multiple copies of data
  • Hadoop Cluster Gateway Node
  • Hadoop Cluster Gateway Node Switch
  • Hadoop Cluster Gateway Node Switch Server 1 Server 2
  • Hadoop Cluster Gateway Node Switch Server 1 Server 2 Server 3 Server 4 Server 5
  • Hadoop Cluster Gateway Node Switch Server 1 Server 2 Server 3 Server 4 Server 5
  • MapReduce
  • How does Hadoop work? • Data of an organisation is loaded into the Hadoop software
  • How does Hadoop work? • Data of an organisation is loaded into the Hadoop software • Data is divided into different pieces & sent to different servers
  • How does Hadoop work? • Data of an organisation is loaded into the Hadoop software • Data is divided into different pieces & sent to different servers • Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data
  • How does Hadoop work? • Data of an organisation is loaded into the Hadoop software • Data is divided into different pieces & sent to different servers • Hadoop keeps track of the data by sending a job code to all the servers that store the relevant piece of data • Each server applies the job code to the portion of data stored on it and returns results
  • Indexing Job Hadoop Software Server 1 Server 2 Server 3 Job Code 1 + Processing Data Job Code 2 + Processing Data Job Code 3 + Processing Data Result
  • EXAMPLE:  user_id  user_name
  • EXAMPLE:  user_id  user_name  city_name  service_provider_name  and call_time
  •  user_id  user_name  city_name  service_provider_name  and call_time
  • RECAP  Various aspects of distribution and computing for Big Data  Hadoop as a technology for handling Big Data
  • BUMPER
  • BUMPER
  • Topic 3 Class 1 - Introduction to Big Data Technologies for Handling Big Data
  • Distribution & Computing for Big Data Topic 3 – Technologies for Handling Big Data Introducing Hadoop Cloud Computing & In-Memory Technologies for Big Data
  • Features of Cloud Computing: • Scalability
  • Features of Cloud Computing: • Scalability • Elasticity
  • Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling
  • Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling • Self Service
  • Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling • Self Service • Low Costs
  • Features of Cloud Computing: • Scalability • Elasticity • Resource Pooling • Self Service • Low Costs • Fault Tolerance
  • What are Cloud Deployment Modules?
  • PRIVATE CLOUD
  • Categories of Cloud Services:
  • Other Amazon Web Services: • Amazon Elastic MapReduce
  • Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB
  • Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB • Amazon S3
  • Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB • Amazon S3 • Amazon High-Performance Computing
  • Other Amazon Web Services: • Amazon Elastic MapReduce • Amazon Dynamo DB • Amazon S3 • Amazon High-Performance Computing • Amazon RedShift
  • Google Web Services: • Google Compute Engine
  • Google Web Services: • Google Compute Engine • Google Big Query
  • Google Web Services: • Google Compute Engine • Google Big Query • Google Prediction API
  • Windows Azure
  • In-memory technology makes it possible for
  • In-memory technology makes it possible for departments or business units
  • In-memory technology makes it possible for departments or business units to take the part of the organizational data
  • In-memory technology makes it possible for departments or business units to take the part of the organizational data that is relevant to their needs and process it locally.
  • RECAP In this session we discussed cloud computing & various in-memory technologies for handling Big Data.
  • BUMPER