SlideShare a Scribd company logo
1
FUNDAMENTALS
OF
BIG DATA
2
The Story of Big Data
3
Introduction
In 2005, Mark Kryder observed that
magnetic disk storage capacity was
increasing very quickly, “Inside of a
decade and a half, hard disks had
increased their capacity 1,000-fold.”
Intel founder Gordon Moore
called this rate of increase
"flabbergasting.”
4
Being Data-driven
● Companies have taken advantage of this ability to store and quickly access massive
amounts data—so much so that every day across the globe we create 2.5 quintillion
bytes of data (2.5 exabytes).*
● Data-driven companies have demanded new data-processing and analysis techniques
that can scale to handle very large computing workloads.
● This ensuing explosion in the amount and variety of data available, and the challenges
of processing and analyzing it, led to the concept of big data.
*IBM
5
A few Big Data Success Stories
● PredPol Inc., the Los Angeles and Santa Cruz police departments, and a team of
educators created software to analyze crime data, and predict where crimes are
likely to occur down to 500 square feet. In areas in LA where the software is being
used, there's been a 33% reduction in burglaries and a 21% reduction in violent
crimes.
● The Tesco supermarket chain collected 70 million data points (such as energy
consumption) from its refrigerators and learned to predict when the refrigerators
need servicing to cut down on energy costs.
Source: searchcio.techtarget.com
6
3 Key Things Driving The Growth of Big Data
1. People
● Using mobile phones, the Internet, and a variety of other things, billions of people are creating and
consuming information faster than ever before in history.
2. Organizations
● Some companies have established dominant positions as leaders in their markets by successfully
mastering a variety of complex data types and tools to run operations and derive business
intelligence insights.
● Most companies are not equipped to handle the vast amount of data available.
3. Sensors and beacons
● A sensor detects changes in its environment and converts this to information. A common example
is a motion detector.
● A beacon gives off a signal that’s detected by a sensor. One example is a Bluetooth® beacon.
● These devices have become smaller, cheaper, and more prevalent, and they generate mountains of
data.
7
Big data is A Broad Term.
Terabytes (TBs) or petabytes (PBs) of data are usually considered big data, but a 100-
gigabyte (GB) relational database could also be a big data problem.
If you have GBs of data per second coming in that you need to process and store, you
have a big data problem.
Even if you only have a moderate amount of data—if you have to repeatedly process and
analyze it, you might have a big data problem.
8
What Makes Data Big?
Big data describes situations that arise when your datasets become so large that
traditional tools, such as relational databases, can no longer adequately process data.
This could be because of:
● Volume: Your dataset is so large that it no longer fits on a single computer or
relational database.
● Velocity: Data comes in rapidly or changes so often that you can’t process it fast
enough for it to be useful.
● Variety: Data comes from a variety of sources and in different formats, which
require different types of processing.
9
Image from TechTarget: “What is big data?”
10
Other factors Impacting Big Data Management
● Value: Extracting insight from large datasets.
● Valence: The ease with which data can be moved from one storage
system to another.
● Veracity: Maintaining data integrity and accuracy.
● Viscosity: The ease with which data can be combined with other
data and made more valuable.
11
Impact of Big Data
Big data issues impact all phases of data handling, including:
● Monitoring
● Collection
● Storage
● Processing
● Analysis
● Reporting
This greatly complicates the information technology (IT) job, demanding more expertise
from IT professionals.
12
Trends in Big Data
Analysts agree that the amount of data generated every year will continue to grow
massively for the foreseeable future. This will create new opportunities to
capitalize on business insights gathered from data.
It’s likely that the variety of sources of data will continue to grow in number.
Adoption of cloud computing will continue to increase as it becomes increasingly
cheaper and easier to use cloud tools. In contrast, on-premise systems are not
likely to become significantly easier to set up and use.
13
Big Data Market
International Data Corporation forecasts that the big data technology and services market
will grow about 23% per year, with annual spending reaching $48.6 billion in 2019.
There are many companies offering services in different areas in the big data industry.
Review this overview of big data vendors and technologies provided by Capgemini.
14
Big Data Complexity Creates IT Opportunities
Data sources Ingest Process Store Analyze Visualize
Real-time processing and analytics
Flat files
EDW
Analytical (OLAP) systems
Stream computing
Operational
systems (OLTP)
Actionable
insights
Reporting
Discovery &
exploration
Modeling &
predictive
analytics
Dashboards
Transaction data
(OLTP)
Traditional data
Application data
(ERP, CRM)
Third-party data
New data sources
Machine data
Docs, emails
Social data
Sensor data
Weblogs,
clickstream data
Images, videos
Data
replication
NoSQL DBs
Staging | Exploration |
Archiving
Transformations
Transformation
Load
Data integration (ETL)
Data quality
Data prep
Data marts
Data Mgmt Governance Security
Operational DBs
ERP, CRM DBs
Advanced
analytics
Data ingest
apps
15
Big Data has Been Inaccessible to Most Businesses
● Big data is difficult: It requires experts to manage a complex, distributed
computing infrastructure. These specialists are expensive and difficult to hire, and
the work takes a lot of time.
● Big data is expensive: Costs tend to grow with the volume, velocity, and variety of
data. And computing resources must be provisioned for peak demand. That means
you might have to purchase more computing resources than you need most of the
time.
Confidential & ProprietaryGoogle Cloud Platform 16
Complexities of Big Data Processing
Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing scale
Utilization
improvements
Typical big data processing
tasks
Confidential & ProprietaryGoogle Cloud Platform 17
Big Data Processing on A Cloud Platform
Programming
Focus on insight,
not infrastructure
18
Recast Big Data Problems as Data Science Opportunities
Your role in sales is to:
● Help identify and scope the big data problem
● Help your customer see it as solvable with data science
● Help your customer see an opportunity to get an advantage over the competition
19
Big Data Use Cases
Data Engineering Data Science
Data Reference Architecture
Cloud Pub/Sub
Asynchronous messaging
Cloud Storage
Raw log storage
Cloud Dataflow
Parallel data processing
BigQuery
Analytics Engine
Cloud Machine Learning
Train Models
Batch Pipeline
Enable object lifecycle management across classes
Single API across all storage classes
ms time to first byte for every class
Cloud Storage
Performant, unified, and cost-effective object storage
Open APIs
Global, fully-managed event delivery
Integrated with Cloud Dataflow for
stream processing
Cloud Pub/Sub - Scalable Event Ingestion and Delivery
Scale from GB to PB with zero operations
Fully Managed SQL Data Warehouse
OLAP Analytics Engine
BigQuery
Proprietary + Confidential
Bigtable: Fully Managed NoSQL Database
Supports open source HBase API and
integrates with GCP data solutions
Fully managed NoSQL, wide column
database for TB to PB datasets
Single indexed schema for thousands
of columns, millions of rows
Low latency and high throughput, millions
of operations per second
BigQuery
engine
BigQuery
Abuse detection
User interactions
Streaming Batch
User engagement analytics
Cloud Pub/Sub
ACL ACLTopic 2
Business
dashboard
Data
science
tools
Users
Devs
Data
scientists
Business
App events ACL ACLTopic 1
Storage Services
Cloud
Storage
Cloud
Datastore
Cloud
SQL
Open Source
orchestration
Connectors
Cloud Dataflow
26
Big Data Use Cases
To recap, the concept of big data covers a lot of ground and generally refers to the
collection, storage, processing, analysis, and visualization of very large and very fast-
moving datasets. Big data use cases span every industry as businesses increasingly
look to differentiate their offerings by extracting insight from the data in their business.
The following slides describe some popular big data use cases.
27
Use Case: Extract, Transform, and
Load
Whenever you’re managing a massive amount of data, you’re
going to need to:
1. Extract a lot of raw data from disparate sources.
2. Transform that data into a form that can be used for your
business operations or analysis, perhaps by aggregating
or cleansing it.
3. Load that data into your data warehouse so you can use
it.
ETL is a process that generally refers to moving data.
Sometimes people use what's called ELT, where they load the
unprepared data into a data warehouse and then prepare it
there. It's an alternative to ETL.
28
Use Case: 360-degree Customer View
A 360-degree customer view is the attempt to get a
complete view of customers by combining data from
various touch points, such as marketing and the
purchasing process. Businesses use a 360-degree
customer view to drive better engagement, more revenue,
and long-term loyalty. It’s used by:
● Financial service businesses to determine the best
financial packages—insurance, investments, and so
on—to sell to specific customers.
● Retail businesses to determine the best times to
make special offers to maximize sales.
● Enterprise businesses to determine customer
retention and upsell strategies.
29
Use Case: Fraud Detection
Fraud detection is the process of identifying anomalies in patterns of behavior that signal
potential fraud. Today, fraud detection can involve analyzing large volumes of data, such as:
● Transactions
● Authorization information
● Buying patterns
For example, it’s used by:
● Credit card companies to prevent unauthorized purchases that don’t match a
customer’s profile.
● Financial service businesses to prevent illegal financial transactions.
● Technology businesses to prevent unauthorized access to products and services, such
as email.
30
Uses Case: Saving Lives
● Sequencing a human genome—all 3 billion “letters” that denote an individual's
unique DNA sequence—is providing information that’s improving scientists'
understanding of the genetic basis of many human diseases.
● Other large-scale projects, such as the 100,000 Genomes Project, are starting to
give some families a diagnosis for a child’s mysterious condition. Participants give
consent for their genome data to be linked to information about their medical
condition and health records. The medical and genomic data is shared with
researchers to improve knowledge of the causes, treatment, and care of diseases.
31
Other Use Cases
Source: A.T. Kearney Analysis
32
Common Big Data Terms
33
● Node: Usually a device on a network. A node on the Internet is anything that has an
IP address.
● Distributed processing: The method of spreading data-processing capabilities
across a set of networked computers.
● Batch processing: Processing of sets of data instead of single units to maximize
efficiency.
● Stream processing: Continuous and automatic processing of data as it’s captured,
in order to generate systematic output.
● Massively parallel processing (MPP): The use of a large number of distributed
computers to perform a set of coordinated computations in parallel
(simultaneously).
Common Big Data Terms- I
34
Data collection: The process of gathering data for the purposes of analysis and
evaluation from a variety of sources which can be structured or unstructured in format.
Also called data capture.
Data aggregation: The process of compiling of information from multiple databases to
create a combined dataset, usually for data processing, reporting, or analysis.
Data pipeline: Executable code defining a set of data-processing steps for transforming
data.
Machine data: Records the activity and behavior of customers, users, transactions,
applications, servers, networks and mobile devices. It includes configurations, data from
APIs, message queues, change events, the output of diagnostic commands, call detail
records, sensor data from industrial systems, and more.
Common Big Data Terms- II
35
Data Science: Data Science is the field of study of where information comes from, what
it represents, and how it can be turned into valuable insights.
Data lake: A storage repository that holds a vast amount of raw data in its native format,
including structured, semistructured, and unstructured data. Data is extracted from a
data lake as needed and transformed into the format used in downline processing.
Data monitoring: A business practice in which critical business data is routinely checked
against quality control rules to make sure it is always high quality and meets previously
established standards for formatting and consistency.
Data warehouse: A system used for reporting and data analysis which are central
repositories of integrated data from one or more disparate sources.
Vanilla: Refers to an installation that is straight from the source, contains no
customization, and isn’t distributed by a third party.
Common Big Data Terms- III
36
Common Data Analytics Terms
37
Common Data Analytics Terms- I
Data mart: The part of a data warehouse that’s used to get data out to users which is
usually oriented to a specific business line or team.
Statistical computing: The interface between statistics and computer science. It’s the
area of computational science (or scientific computing) specific to the mathematical
science of statistics.
Web, mobile, and commerce analytics: The measurement, collection, analysis, and
reporting of web, mobile, or commerce data for purposes of understanding and
optimizing usage.
Online analytical processing (OLAP): An approach to answering analytical queries
swiftly as part of the broader category of business intelligence. Typical applications of
OLAP include business reporting for sales, marketing, management reporting, business
process management, budgeting and forecasting, financial reporting, and similar areas.
38
Common Data Analytics Terms- II
Statistical computing: The interface between statistics and computer science. It’s the
area of computational science (or scientific computing) specific to the mathematical
science of statistics.
Real time: Means that there is near zero latency and access to data information
whenever it is required. This leads to business insights being understood in real time
versus after an event has taken place. Analytics processing jobs used to take hours or
days, often rendering critical business information no longer useful.
Speech and vision recognition, and natural language processing: 3 core areas of
machine learning that rely on huge amounts of training data and must process large
amounts of data in real time.
39
Test Yourself: Can You Define Everything Shown?
Data sources Ingest Process Store Analyze Visualize
Real-time processing and analytics
Flat files
EDW
Analytical (OLAP) systems
Stream computing
Operational
systems (OLTP)
Actionable
Insights
Reporting
Discovery &
exploration
Modeling &
predictive
analytics
Dashboards
Transaction data
(OLTP)
Traditional data
Application data
(ERP, CRM)
Third-party data
New data sources
Machine data
Docs, emails
Social data
Sensor data
Weblogs,
clickstream data
Images, videos
Data
replication
NoSQL DBs
Staging | Exploration |
Archiving
Transformations
Transformation
Load
Data Integration (ETL)
Data quality
Data prep
Data marts
Data mgmt Governance Security
Operational DBs
ERP, CRM DBs
Advanced
analytics
Data ingest
apps
40
Additional Resources
● Big data assets on the partner portal
● Google Cloud Platform big data one pager
● Big Data and the Creative Destruction of Today’s Business Models
● Public data sets for use by anyone for analyzing problems
● Video: What is Big Data? Can it help us solve some of society’s big challenges?
● Video: Deep Learning: Intelligence from Big Data
● Online Harvard course on Data Science
● Interesting big data infographic
● How big data is changing the database landscape

More Related Content

What's hot

Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
Sanjeev Solanki
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
Xoriant Corporation
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
Neelam Rawat
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
Chirag Ahuja
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Homeland Security Research Corp.
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
Mahindra Comviva
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
kk1718
 
Big data
Big dataBig data
Big data
Pooja Shah
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
Shilpa Soi
 
Big data
Big dataBig data
Big datahsn99
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Yash Raj
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data Trends
IMC Institute
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
Nazir Ahmed
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
CRISIL Limited
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
Shree M.L.Kakadiya MCA mahila college, Amreli
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Maruf Abdullah (Rion)
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
Sivashankar Ganapathy
 
Big data
Big dataBig data
Big data
Nimish Kochhar
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
Luis Miguel Salgado
 

What's hot (20)

Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
 
10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies10 Most Effective Big Data Technologies
10 Most Effective Big Data Technologies
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big dataBig data
Big data
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data Trends
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big data
Big dataBig data
Big data
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
 

Similar to Fundamentals of Big Data

Big data
Big dataBig data
Big data
Mahmudul Alam
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
Digimark
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
KARTIKEY TRIPATHI
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
Guduru Lakshmi Kiranmai
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
berasrujana
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
VaishnavGhadge1
 
Big data ppt
Big data pptBig data ppt
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
tangyechloe
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
TanguturiAvinash
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Remas Ittahir
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
PentaTech
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
nayanbhatia2
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 

Similar to Fundamentals of Big Data (20)

Big data
Big dataBig data
Big data
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 

More from The Wisdom Daily

Engineering UX
Engineering UXEngineering UX
Engineering UX
The Wisdom Daily
 
How to Scale for IoT?
How to Scale for IoT?How to Scale for IoT?
How to Scale for IoT?
The Wisdom Daily
 
Digital Transformation: Best Practices
Digital Transformation: Best PracticesDigital Transformation: Best Practices
Digital Transformation: Best Practices
The Wisdom Daily
 
How to Design for User Trust?
How to Design for User Trust?How to Design for User Trust?
How to Design for User Trust?
The Wisdom Daily
 
Building Trust in the Cyberspace
Building Trust in the CyberspaceBuilding Trust in the Cyberspace
Building Trust in the Cyberspace
The Wisdom Daily
 
How to Get Started in ML?
How to Get Started in ML?How to Get Started in ML?
How to Get Started in ML?
The Wisdom Daily
 
Security and Privacy Issues in Deep Learning
Security and Privacy Issues in Deep LearningSecurity and Privacy Issues in Deep Learning
Security and Privacy Issues in Deep Learning
The Wisdom Daily
 
Understanding Intelligence: Ml vs. AI
Understanding Intelligence: Ml vs. AIUnderstanding Intelligence: Ml vs. AI
Understanding Intelligence: Ml vs. AI
The Wisdom Daily
 
Comp science
Comp scienceComp science
Comp science
The Wisdom Daily
 
Mobile Best Practices for UX
Mobile Best Practices for UXMobile Best Practices for UX
Mobile Best Practices for UX
The Wisdom Daily
 
UX for Product Excellence
UX for Product ExcellenceUX for Product Excellence
UX for Product Excellence
The Wisdom Daily
 
Principles of UX Engineering
Principles of UX EngineeringPrinciples of UX Engineering
Principles of UX Engineering
The Wisdom Daily
 
How to Conquer the Field of UX?
How to Conquer the Field of UX?How to Conquer the Field of UX?
How to Conquer the Field of UX?
The Wisdom Daily
 
The How, Why and What of Metrics?
The How, Why and What of Metrics?The How, Why and What of Metrics?
The How, Why and What of Metrics?
The Wisdom Daily
 
How to Make Your Ideas Stick for UX?
How to Make Your Ideas Stick for UX?How to Make Your Ideas Stick for UX?
How to Make Your Ideas Stick for UX?
The Wisdom Daily
 
Fundamentals of UX Design
Fundamentals of UX DesignFundamentals of UX Design
Fundamentals of UX Design
The Wisdom Daily
 
Basics of UX Research
Basics of UX ResearchBasics of UX Research
Basics of UX Research
The Wisdom Daily
 
How to Design in a Multiscreen World ?
How to Design in a Multiscreen World ?How to Design in a Multiscreen World ?
How to Design in a Multiscreen World ?
The Wisdom Daily
 
Deep learning & Humanity's Grand Challenges
Deep learning & Humanity's Grand ChallengesDeep learning & Humanity's Grand Challenges
Deep learning & Humanity's Grand Challenges
The Wisdom Daily
 
Basics of User Experience Research
Basics of User Experience ResearchBasics of User Experience Research
Basics of User Experience Research
The Wisdom Daily
 

More from The Wisdom Daily (20)

Engineering UX
Engineering UXEngineering UX
Engineering UX
 
How to Scale for IoT?
How to Scale for IoT?How to Scale for IoT?
How to Scale for IoT?
 
Digital Transformation: Best Practices
Digital Transformation: Best PracticesDigital Transformation: Best Practices
Digital Transformation: Best Practices
 
How to Design for User Trust?
How to Design for User Trust?How to Design for User Trust?
How to Design for User Trust?
 
Building Trust in the Cyberspace
Building Trust in the CyberspaceBuilding Trust in the Cyberspace
Building Trust in the Cyberspace
 
How to Get Started in ML?
How to Get Started in ML?How to Get Started in ML?
How to Get Started in ML?
 
Security and Privacy Issues in Deep Learning
Security and Privacy Issues in Deep LearningSecurity and Privacy Issues in Deep Learning
Security and Privacy Issues in Deep Learning
 
Understanding Intelligence: Ml vs. AI
Understanding Intelligence: Ml vs. AIUnderstanding Intelligence: Ml vs. AI
Understanding Intelligence: Ml vs. AI
 
Comp science
Comp scienceComp science
Comp science
 
Mobile Best Practices for UX
Mobile Best Practices for UXMobile Best Practices for UX
Mobile Best Practices for UX
 
UX for Product Excellence
UX for Product ExcellenceUX for Product Excellence
UX for Product Excellence
 
Principles of UX Engineering
Principles of UX EngineeringPrinciples of UX Engineering
Principles of UX Engineering
 
How to Conquer the Field of UX?
How to Conquer the Field of UX?How to Conquer the Field of UX?
How to Conquer the Field of UX?
 
The How, Why and What of Metrics?
The How, Why and What of Metrics?The How, Why and What of Metrics?
The How, Why and What of Metrics?
 
How to Make Your Ideas Stick for UX?
How to Make Your Ideas Stick for UX?How to Make Your Ideas Stick for UX?
How to Make Your Ideas Stick for UX?
 
Fundamentals of UX Design
Fundamentals of UX DesignFundamentals of UX Design
Fundamentals of UX Design
 
Basics of UX Research
Basics of UX ResearchBasics of UX Research
Basics of UX Research
 
How to Design in a Multiscreen World ?
How to Design in a Multiscreen World ?How to Design in a Multiscreen World ?
How to Design in a Multiscreen World ?
 
Deep learning & Humanity's Grand Challenges
Deep learning & Humanity's Grand ChallengesDeep learning & Humanity's Grand Challenges
Deep learning & Humanity's Grand Challenges
 
Basics of User Experience Research
Basics of User Experience ResearchBasics of User Experience Research
Basics of User Experience Research
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 

Fundamentals of Big Data

  • 2. 2 The Story of Big Data
  • 3. 3 Introduction In 2005, Mark Kryder observed that magnetic disk storage capacity was increasing very quickly, “Inside of a decade and a half, hard disks had increased their capacity 1,000-fold.” Intel founder Gordon Moore called this rate of increase "flabbergasting.”
  • 4. 4 Being Data-driven ● Companies have taken advantage of this ability to store and quickly access massive amounts data—so much so that every day across the globe we create 2.5 quintillion bytes of data (2.5 exabytes).* ● Data-driven companies have demanded new data-processing and analysis techniques that can scale to handle very large computing workloads. ● This ensuing explosion in the amount and variety of data available, and the challenges of processing and analyzing it, led to the concept of big data. *IBM
  • 5. 5 A few Big Data Success Stories ● PredPol Inc., the Los Angeles and Santa Cruz police departments, and a team of educators created software to analyze crime data, and predict where crimes are likely to occur down to 500 square feet. In areas in LA where the software is being used, there's been a 33% reduction in burglaries and a 21% reduction in violent crimes. ● The Tesco supermarket chain collected 70 million data points (such as energy consumption) from its refrigerators and learned to predict when the refrigerators need servicing to cut down on energy costs. Source: searchcio.techtarget.com
  • 6. 6 3 Key Things Driving The Growth of Big Data 1. People ● Using mobile phones, the Internet, and a variety of other things, billions of people are creating and consuming information faster than ever before in history. 2. Organizations ● Some companies have established dominant positions as leaders in their markets by successfully mastering a variety of complex data types and tools to run operations and derive business intelligence insights. ● Most companies are not equipped to handle the vast amount of data available. 3. Sensors and beacons ● A sensor detects changes in its environment and converts this to information. A common example is a motion detector. ● A beacon gives off a signal that’s detected by a sensor. One example is a Bluetooth® beacon. ● These devices have become smaller, cheaper, and more prevalent, and they generate mountains of data.
  • 7. 7 Big data is A Broad Term. Terabytes (TBs) or petabytes (PBs) of data are usually considered big data, but a 100- gigabyte (GB) relational database could also be a big data problem. If you have GBs of data per second coming in that you need to process and store, you have a big data problem. Even if you only have a moderate amount of data—if you have to repeatedly process and analyze it, you might have a big data problem.
  • 8. 8 What Makes Data Big? Big data describes situations that arise when your datasets become so large that traditional tools, such as relational databases, can no longer adequately process data. This could be because of: ● Volume: Your dataset is so large that it no longer fits on a single computer or relational database. ● Velocity: Data comes in rapidly or changes so often that you can’t process it fast enough for it to be useful. ● Variety: Data comes from a variety of sources and in different formats, which require different types of processing.
  • 9. 9 Image from TechTarget: “What is big data?”
  • 10. 10 Other factors Impacting Big Data Management ● Value: Extracting insight from large datasets. ● Valence: The ease with which data can be moved from one storage system to another. ● Veracity: Maintaining data integrity and accuracy. ● Viscosity: The ease with which data can be combined with other data and made more valuable.
  • 11. 11 Impact of Big Data Big data issues impact all phases of data handling, including: ● Monitoring ● Collection ● Storage ● Processing ● Analysis ● Reporting This greatly complicates the information technology (IT) job, demanding more expertise from IT professionals.
  • 12. 12 Trends in Big Data Analysts agree that the amount of data generated every year will continue to grow massively for the foreseeable future. This will create new opportunities to capitalize on business insights gathered from data. It’s likely that the variety of sources of data will continue to grow in number. Adoption of cloud computing will continue to increase as it becomes increasingly cheaper and easier to use cloud tools. In contrast, on-premise systems are not likely to become significantly easier to set up and use.
  • 13. 13 Big Data Market International Data Corporation forecasts that the big data technology and services market will grow about 23% per year, with annual spending reaching $48.6 billion in 2019. There are many companies offering services in different areas in the big data industry. Review this overview of big data vendors and technologies provided by Capgemini.
  • 14. 14 Big Data Complexity Creates IT Opportunities Data sources Ingest Process Store Analyze Visualize Real-time processing and analytics Flat files EDW Analytical (OLAP) systems Stream computing Operational systems (OLTP) Actionable insights Reporting Discovery & exploration Modeling & predictive analytics Dashboards Transaction data (OLTP) Traditional data Application data (ERP, CRM) Third-party data New data sources Machine data Docs, emails Social data Sensor data Weblogs, clickstream data Images, videos Data replication NoSQL DBs Staging | Exploration | Archiving Transformations Transformation Load Data integration (ETL) Data quality Data prep Data marts Data Mgmt Governance Security Operational DBs ERP, CRM DBs Advanced analytics Data ingest apps
  • 15. 15 Big Data has Been Inaccessible to Most Businesses ● Big data is difficult: It requires experts to manage a complex, distributed computing infrastructure. These specialists are expensive and difficult to hire, and the work takes a lot of time. ● Big data is expensive: Costs tend to grow with the volume, velocity, and variety of data. And computing resources must be provisioned for peak demand. That means you might have to purchase more computing resources than you need most of the time.
  • 16. Confidential & ProprietaryGoogle Cloud Platform 16 Complexities of Big Data Processing Programming Resource provisioning Performance tuning Monitoring Reliability Deployment & configuration Handling growing scale Utilization improvements Typical big data processing tasks
  • 17. Confidential & ProprietaryGoogle Cloud Platform 17 Big Data Processing on A Cloud Platform Programming Focus on insight, not infrastructure
  • 18. 18 Recast Big Data Problems as Data Science Opportunities Your role in sales is to: ● Help identify and scope the big data problem ● Help your customer see it as solvable with data science ● Help your customer see an opportunity to get an advantage over the competition
  • 20. Data Engineering Data Science Data Reference Architecture Cloud Pub/Sub Asynchronous messaging Cloud Storage Raw log storage Cloud Dataflow Parallel data processing BigQuery Analytics Engine Cloud Machine Learning Train Models Batch Pipeline
  • 21. Enable object lifecycle management across classes Single API across all storage classes ms time to first byte for every class Cloud Storage Performant, unified, and cost-effective object storage
  • 22. Open APIs Global, fully-managed event delivery Integrated with Cloud Dataflow for stream processing Cloud Pub/Sub - Scalable Event Ingestion and Delivery
  • 23. Scale from GB to PB with zero operations Fully Managed SQL Data Warehouse OLAP Analytics Engine BigQuery
  • 24. Proprietary + Confidential Bigtable: Fully Managed NoSQL Database Supports open source HBase API and integrates with GCP data solutions Fully managed NoSQL, wide column database for TB to PB datasets Single indexed schema for thousands of columns, millions of rows Low latency and high throughput, millions of operations per second
  • 25. BigQuery engine BigQuery Abuse detection User interactions Streaming Batch User engagement analytics Cloud Pub/Sub ACL ACLTopic 2 Business dashboard Data science tools Users Devs Data scientists Business App events ACL ACLTopic 1 Storage Services Cloud Storage Cloud Datastore Cloud SQL Open Source orchestration Connectors Cloud Dataflow
  • 26. 26 Big Data Use Cases To recap, the concept of big data covers a lot of ground and generally refers to the collection, storage, processing, analysis, and visualization of very large and very fast- moving datasets. Big data use cases span every industry as businesses increasingly look to differentiate their offerings by extracting insight from the data in their business. The following slides describe some popular big data use cases.
  • 27. 27 Use Case: Extract, Transform, and Load Whenever you’re managing a massive amount of data, you’re going to need to: 1. Extract a lot of raw data from disparate sources. 2. Transform that data into a form that can be used for your business operations or analysis, perhaps by aggregating or cleansing it. 3. Load that data into your data warehouse so you can use it. ETL is a process that generally refers to moving data. Sometimes people use what's called ELT, where they load the unprepared data into a data warehouse and then prepare it there. It's an alternative to ETL.
  • 28. 28 Use Case: 360-degree Customer View A 360-degree customer view is the attempt to get a complete view of customers by combining data from various touch points, such as marketing and the purchasing process. Businesses use a 360-degree customer view to drive better engagement, more revenue, and long-term loyalty. It’s used by: ● Financial service businesses to determine the best financial packages—insurance, investments, and so on—to sell to specific customers. ● Retail businesses to determine the best times to make special offers to maximize sales. ● Enterprise businesses to determine customer retention and upsell strategies.
  • 29. 29 Use Case: Fraud Detection Fraud detection is the process of identifying anomalies in patterns of behavior that signal potential fraud. Today, fraud detection can involve analyzing large volumes of data, such as: ● Transactions ● Authorization information ● Buying patterns For example, it’s used by: ● Credit card companies to prevent unauthorized purchases that don’t match a customer’s profile. ● Financial service businesses to prevent illegal financial transactions. ● Technology businesses to prevent unauthorized access to products and services, such as email.
  • 30. 30 Uses Case: Saving Lives ● Sequencing a human genome—all 3 billion “letters” that denote an individual's unique DNA sequence—is providing information that’s improving scientists' understanding of the genetic basis of many human diseases. ● Other large-scale projects, such as the 100,000 Genomes Project, are starting to give some families a diagnosis for a child’s mysterious condition. Participants give consent for their genome data to be linked to information about their medical condition and health records. The medical and genomic data is shared with researchers to improve knowledge of the causes, treatment, and care of diseases.
  • 31. 31 Other Use Cases Source: A.T. Kearney Analysis
  • 33. 33 ● Node: Usually a device on a network. A node on the Internet is anything that has an IP address. ● Distributed processing: The method of spreading data-processing capabilities across a set of networked computers. ● Batch processing: Processing of sets of data instead of single units to maximize efficiency. ● Stream processing: Continuous and automatic processing of data as it’s captured, in order to generate systematic output. ● Massively parallel processing (MPP): The use of a large number of distributed computers to perform a set of coordinated computations in parallel (simultaneously). Common Big Data Terms- I
  • 34. 34 Data collection: The process of gathering data for the purposes of analysis and evaluation from a variety of sources which can be structured or unstructured in format. Also called data capture. Data aggregation: The process of compiling of information from multiple databases to create a combined dataset, usually for data processing, reporting, or analysis. Data pipeline: Executable code defining a set of data-processing steps for transforming data. Machine data: Records the activity and behavior of customers, users, transactions, applications, servers, networks and mobile devices. It includes configurations, data from APIs, message queues, change events, the output of diagnostic commands, call detail records, sensor data from industrial systems, and more. Common Big Data Terms- II
  • 35. 35 Data Science: Data Science is the field of study of where information comes from, what it represents, and how it can be turned into valuable insights. Data lake: A storage repository that holds a vast amount of raw data in its native format, including structured, semistructured, and unstructured data. Data is extracted from a data lake as needed and transformed into the format used in downline processing. Data monitoring: A business practice in which critical business data is routinely checked against quality control rules to make sure it is always high quality and meets previously established standards for formatting and consistency. Data warehouse: A system used for reporting and data analysis which are central repositories of integrated data from one or more disparate sources. Vanilla: Refers to an installation that is straight from the source, contains no customization, and isn’t distributed by a third party. Common Big Data Terms- III
  • 37. 37 Common Data Analytics Terms- I Data mart: The part of a data warehouse that’s used to get data out to users which is usually oriented to a specific business line or team. Statistical computing: The interface between statistics and computer science. It’s the area of computational science (or scientific computing) specific to the mathematical science of statistics. Web, mobile, and commerce analytics: The measurement, collection, analysis, and reporting of web, mobile, or commerce data for purposes of understanding and optimizing usage. Online analytical processing (OLAP): An approach to answering analytical queries swiftly as part of the broader category of business intelligence. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management, budgeting and forecasting, financial reporting, and similar areas.
  • 38. 38 Common Data Analytics Terms- II Statistical computing: The interface between statistics and computer science. It’s the area of computational science (or scientific computing) specific to the mathematical science of statistics. Real time: Means that there is near zero latency and access to data information whenever it is required. This leads to business insights being understood in real time versus after an event has taken place. Analytics processing jobs used to take hours or days, often rendering critical business information no longer useful. Speech and vision recognition, and natural language processing: 3 core areas of machine learning that rely on huge amounts of training data and must process large amounts of data in real time.
  • 39. 39 Test Yourself: Can You Define Everything Shown? Data sources Ingest Process Store Analyze Visualize Real-time processing and analytics Flat files EDW Analytical (OLAP) systems Stream computing Operational systems (OLTP) Actionable Insights Reporting Discovery & exploration Modeling & predictive analytics Dashboards Transaction data (OLTP) Traditional data Application data (ERP, CRM) Third-party data New data sources Machine data Docs, emails Social data Sensor data Weblogs, clickstream data Images, videos Data replication NoSQL DBs Staging | Exploration | Archiving Transformations Transformation Load Data Integration (ETL) Data quality Data prep Data marts Data mgmt Governance Security Operational DBs ERP, CRM DBs Advanced analytics Data ingest apps
  • 40. 40 Additional Resources ● Big data assets on the partner portal ● Google Cloud Platform big data one pager ● Big Data and the Creative Destruction of Today’s Business Models ● Public data sets for use by anyone for analyzing problems ● Video: What is Big Data? Can it help us solve some of society’s big challenges? ● Video: Deep Learning: Intelligence from Big Data ● Online Harvard course on Data Science ● Interesting big data infographic ● How big data is changing the database landscape