SlideShare a Scribd company logo
Big Data with Hadoop, Spark
and BigQuery
Google Cloud Next Extended 2017
Speaker: Imam Raza
Speaker.bio.toString()
Senior Software Architect @Folio3
Specialities:
Designing scalable Enterprise Software Architecture,
Designing scalable mobile app.
IBM Big Data certified professional.
MongoDB certified professional.
About this presentation
me.loveQuestion==true. Let's have interactive session.
The content is designed on basis of industry experience.
Would have some lab sessions
Switching the gear with interesting silicon valley facts.
Agenda
What is Big Data?
What is Big Data components?
What is hadoop?
What is spark
What is BigQuery?
Designing scalable Vs fashionable applications.
What is Big Data?
Google Trends on Big Data
Big Data Definition
Five Vs of Big Data
1st V:Velocity
2nd V: Volume
Growth in Global Data
How big is Zettabyte?
3rd V: Variety
4th V: Veracity
5th V: Value
Value
Big Data Business
implementation
Recommendation Engines
Netflix Show “House of Card” was an immediate hit
Big data business application
Better understand and target customers
Understand and optimize Business process
Improving Health
Improving security and Law enforcement
Improving sports performance
Improving and optimizing Cities and Countries
Types of Source of Big Data
Structured Data (RDBMS, Spreadsheets)
Unstructured Data (raw data)
Semi-Structured Data (XML,JSON)
Switching the gear
A mandatory books for silicon valley
graduates looking for jobs.
Big Data Ecosystem
Big Data Tools
● Hadoop
● YARN
● Mesos
● Spark
● BigQuery
● BigSQL
● Kafka
● Hive
● Pig
● Sqoop
● ZooKeeper
● HBase
● Shark
● Cassandra
● MongoDB
● CouchDB
● Cloudera
● Pentaho
● etc
Spark Hadoop System
Hadoop
Hadoop is an open-source software framework that
supports data-intensive distributed applications
A Hadoop cluster is composed of a single master node
and multiple worker nodes
Hadoop Primary Components
HDFS – Hadoop Distributed File System.(Storing large
amounts of data)
MapReduce Programming Model- (Processing large
amounts of data)
HDFS
Moving Code to Data Philosophy
If code and data are on different machines, one of them must be moved to
the other machine before the code can be executed on the data.
If the code is smaller than the data, better to send the code to the machine
holding the data than the other way around, if all the machines are equally
fast.
In the world of Big Data, the code is almost always smaller than the data.
Job/Task Management
MapReduce
MapReduce Example
Hadoop / MapReduce RDBMS
Size of data Petabytes Gigabytes
Integrity of data Low High (referential, typed)
Data schema Dynamic Static
Access method Batch Interactive and Batch
Scaling Linear Nonlinear (worse than
linear)
Data structure Unstructured Structured
Normalization of data Not Required Required
Query Response Time Has latency (due to batch
processing)
Can be near immediate
Apache Spark
Apache Spark is a lightning-fast cluster computing technology,
designed for fast computation.
It is based on Hadoop MapReduce and it extends the MapReduce
model to efficiently use it for more types of computations,
which includes interactive queries and stream processing
Apache Spark features
Speed: Spark helps to run an application in Hadoop cluster, up to
100 times faster in memory, and 10 times faster when running
on disk.
Support Multi languages: provides built-in APIs in Java, Scala, or
Python
Advanced Analytics: Supports SQL queries, Streaming data,
Machine learning (ML), and Graph algorithms.
Apache Spark Libs
Apache Spark Lab Session
Via http://datascientistworkbench.com
Switching the gear
Silicon valley awakes early in the
morning
Big Query
BigQuery
A service that enables interactive analysis of massively large datasets
Based on Dremel, a scalable, interactive ad hoc query system for analysis
of read-only nested data
Working in conjunction with Google Storage
Has a RESTful web service interface.
BigQuery
You can issue SQL queries over big data
Interactive web interface
As small response time as possible
Auto scales under the hood
.
BigQuery
SaaS (/ PaaS)
Interfacing:
REST API
Web console
Command line tools
Language libraries
Insert only
.
BigQuery Lab Session
Via https://bigquery.cloud.google.com
Switching the gear
Zareen is a pakistani restaurant in
Google Mountain View.
1477 Plymouth Street, Suite C
Mountain View, CA 94043
http://www.zareensrestaurant.com/
Designing Scalable Vs
Fashionable apps
References
https://cloud.google.com/bigquery/public-data/
https://bigquery.cloud.google.com
IBM BigData virtual Lab (https://datascientistworkbench.com/)
IBM Big data University (http://bigdatauniversity.com)
Questions

More Related Content

What's hot

Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 

What's hot (20)

Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker Austin
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
 
Finding new Customers using D&B and Excel Power Query
Finding new Customers using D&B and Excel Power QueryFinding new Customers using D&B and Excel Power Query
Finding new Customers using D&B and Excel Power Query
 
Using Premium Data - for Business Analysts
Using Premium Data - for Business AnalystsUsing Premium Data - for Business Analysts
Using Premium Data - for Business Analysts
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 
EDB Postgres in DBaaS & Container Platforms
EDB Postgres in DBaaS & Container PlatformsEDB Postgres in DBaaS & Container Platforms
EDB Postgres in DBaaS & Container Platforms
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
Google Cloud Next 2021 Recap
 Google Cloud Next 2021 Recap Google Cloud Next 2021 Recap
Google Cloud Next 2021 Recap
 

Viewers also liked

Big Data University ML0101EN Certificate _ Big Data University
Big Data University ML0101EN Certificate _ Big Data UniversityBig Data University ML0101EN Certificate _ Big Data University
Big Data University ML0101EN Certificate _ Big Data University
Imam Raza
 
Android presentation
Android presentationAndroid presentation
Android presentation
Imam Raza
 
Google Developer Group(GDG) DevFest Event 2012 Android talk
Google Developer Group(GDG) DevFest Event 2012 Android talkGoogle Developer Group(GDG) DevFest Event 2012 Android talk
Google Developer Group(GDG) DevFest Event 2012 Android talk
Imam Raza
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
Taro L. Saito
 
Western Civilization Lecture 4
Western Civilization Lecture 4Western Civilization Lecture 4
Western Civilization Lecture 4
Mr-Mike
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Chris Fregly
 

Viewers also liked (20)

Big Data University ML0101EN Certificate _ Big Data University
Big Data University ML0101EN Certificate _ Big Data UniversityBig Data University ML0101EN Certificate _ Big Data University
Big Data University ML0101EN Certificate _ Big Data University
 
Material design
Material designMaterial design
Material design
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud Platform
 
Android presentation
Android presentationAndroid presentation
Android presentation
 
GDG Devfest 2016 session on Android N
GDG Devfest 2016 session on Android NGDG Devfest 2016 session on Android N
GDG Devfest 2016 session on Android N
 
Apple WWDC 2014 highlights
Apple WWDC 2014 highlightsApple WWDC 2014 highlights
Apple WWDC 2014 highlights
 
SAP HANA Cloud Platform CodeJam
SAP HANA Cloud Platform CodeJamSAP HANA Cloud Platform CodeJam
SAP HANA Cloud Platform CodeJam
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Google Developer Group(GDG) DevFest Event 2012 Android talk
Google Developer Group(GDG) DevFest Event 2012 Android talkGoogle Developer Group(GDG) DevFest Event 2012 Android talk
Google Developer Group(GDG) DevFest Event 2012 Android talk
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
 
MBaaS (Mobile Backend As a Service)
MBaaS (Mobile Backend As a Service)MBaaS (Mobile Backend As a Service)
MBaaS (Mobile Backend As a Service)
 
Western Civilization Lecture 4
Western Civilization Lecture 4Western Civilization Lecture 4
Western Civilization Lecture 4
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Polymer and web component
Polymer and web componentPolymer and web component
Polymer and web component
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?
 
Event Report - Google Next 2017 - Good progress by Google - but is it enough?
Event Report - Google Next 2017 - Good progress by Google - but is it enough?Event Report - Google Next 2017 - Good progress by Google - but is it enough?
Event Report - Google Next 2017 - Good progress by Google - but is it enough?
 
Introduction to Google Cloud Platform Technologies
Introduction to Google Cloud Platform TechnologiesIntroduction to Google Cloud Platform Technologies
Introduction to Google Cloud Platform Technologies
 

Similar to Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Karachi)

Similar to Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Karachi) (20)

Big data with java
Big data with javaBig data with java
Big data with java
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache spark
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approaches
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
How To Build a Successful SaaS Design.pdf
How To Build a Successful SaaS Design.pdfHow To Build a Successful SaaS Design.pdf
How To Build a Successful SaaS Design.pdf
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 

Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Karachi)

Editor's Notes

  1. …refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes. If we take all the data generated in the world between the beginning of time and 2000, the same amount of data will soon be generated every minute. New big data tools use distributed systems so that we can store and analyse data across databases that are dotted around anywhere in the world.
  2. Quintillion 10^18
  3. How big is a zettabyte? One bit is binary. It's either a one or a zero. Eight bits make up one byte, and 1024 bytes make up one kilobyte. 1024 kilobytes make up one megabyte. Large videos and DVDs will be in gigabytes where 1024 megabytes make up one gigabyte of storage space. These days we have USBs or memory sticks that can store a few dozen gigabytes of information where computers and hard drives now store terabytes of information. One terabyte is 1025 gigabytes. 1024 terabytes make up one petabyte, and 1024 petabytes make up an exabyte. Think of a big urban city or a busy international airport like Heathrow, JFK, O'Hare, Dubai, or O. R. Tambo in Johannesburg. And now we're talking petabytes and exabytes. All those airplanes are capturing and transmitting data. All the people in those airports have mobile devices. Also consider the security cameras and all the staff in and around the airport. A digital universe study conducted by IDC claimed digital information reached 0.8 zettabytes last year and predicted this number would grow to 35 zettabytes by 2020. It is predicted that by 2020, one tenth of the world's data will be produced by machines, and most of the world's data will be produced in emerging markets. It is also predicted that the amount of data produced will increasingly outpace available storage. Advances in cloud computing have contributed
  4. Refers to the different types of data we can now use.In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world’s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyse and bring together data of different types such as messages, social media conversations, photos, sensor data, video or voice recordings.
  5. Big Data Veracity refers to the biases, noise and abnormality in data. refers to the messiness or trustworthiness of the data. With many forms of big data quality and accuracy are less controllable (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech as well as the reliability and accuracy of content) but technology now allows us to work with this type of data
  6. The first season of the show was released in 2013 and it was an immediate hit. At the time, the New York Times reported that Netflix executives knew that House of Cards would be a hit before they even filmed it, but how do they know that? Big data. Netflix has a lot of data. Netflix knows the time of day when movies are watched. It logs when users pause, rewind and fast forward. It has ratings from millions of users as well as the information on searches they make. By looking at all these big data, Netflix knew many of its users had streamed the work of David Fincher and films featuring Kevin Spacey had always done well. And it knew that the British version of House of Cards had also done well. It also knew that people who liked Fincher also liked Spacey. All these information suggested that buying the series would be a good bet for the company, and in fact it was. In other words, thanks to big data, Netflix knows what people want before they do.
  7. Better understand and target customers: To better understand and target customers, companies expand their traditional data sets with social media data, browser, text analytics or sensor data to get a more complete picture of their customers. The big objective, in many cases, is to create predictive models. Using big data, Telecom companies can now better predict customer churn; retailers can predict what products will sell, and car insurance companies understand how well their customers actually drive. Understand and Optimize Business Processes: Big data is also increasingly used to optimize business processes. Retailers are able to optimize their stock based on predictive models generated from social media data, web search trends and weather forecasts. Another example is supply chain or delivery route optimization using data from geographic positioning and radio frequency identification sensors. Improving Health: The computing power of big data analytics enables us to find new cures and better understand and predict disease patterns. We can use all the data from smart watches and wearable devices to better understand links between lifestyles and diseases. Big data analytics also allow us to monitor and predict epidemics and disease outbreaks, simply by listening to what people are saying, i.e. “Feeling rubbish today - in bed with a cold” or searching for on the Internet, i.e. “cures for flu”. Improving Security and Law Enforcement: Security services use big data analytics to foil terrorist plots and detect cyber attacks. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data analytics it to detect fraudulent transactions Improving Sports Performance: Most elite sports have now embraced big data analytics. Many use video analytics to track the performance of every player in a football or baseball game, sensor technology is built into sports equipment such as basket balls or golf clubs, and many elite sports teams track athletes outside of the sporting environment – using smart technology to track nutrition and sleep, as well as social media conversations to monitor emotional wellbeing.
  8. Apache Kafka ElasticSearch Cassandra Mesos
  9. Apache Kafka ElasticSearch Cassandra Mesos