Big data refers to the massive amounts of data that are being created every day from sources like mobile devices, the internet, and sensors. As data volumes and variety have increased exponentially, traditional data processing tools are no longer adequate. This has led to the development of new techniques for data storage, processing, and analysis that can handle "big data". Some key aspects of big data include volume, velocity, and variety of data. Common big data uses cases include customer analytics, fraud detection, and scientific research. Terms related to big data include data pipelines, distributed processing, machine learning, and data visualization.
Are you having doubts and questions about how to use Big Data in your organizations? The presentation here would clear some of your doubts.
Feel free to comment if you have more queries or write to us at: bigdata@xoriant.com
The new age big data technologies include predictive analytics, no SQL databases, search and knowledge discovery, stream analytics, in-memory data fabric, data virtualization and more.
The web-conference hosted by CRISIL Global Research & Analytics on “Big Data’s Big Impact on Businesses” on January 29, 2013, saw participation from senior officials of global multinationals from 9 countries. The presentation described how data analytics is helping businesses make “evidence-based” decisions, thereby creating a positive impact. It also spoke about the opportunities opening up in the Big Data space in India and across the globe.
Hosted by:
Sanjeev Sinha, President, CRISIL Global Research & Analytics
Gaurav Dua, Director & Practice Leader (Technology, Media & Telecom), CRISIL Global Research & Analytics
Big data is a term that describes the large volume of data may be both structured and unstructured.
That inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.
Are you having doubts and questions about how to use Big Data in your organizations? The presentation here would clear some of your doubts.
Feel free to comment if you have more queries or write to us at: bigdata@xoriant.com
The new age big data technologies include predictive analytics, no SQL databases, search and knowledge discovery, stream analytics, in-memory data fabric, data virtualization and more.
The web-conference hosted by CRISIL Global Research & Analytics on “Big Data’s Big Impact on Businesses” on January 29, 2013, saw participation from senior officials of global multinationals from 9 countries. The presentation described how data analytics is helping businesses make “evidence-based” decisions, thereby creating a positive impact. It also spoke about the opportunities opening up in the Big Data space in India and across the globe.
Hosted by:
Sanjeev Sinha, President, CRISIL Global Research & Analytics
Gaurav Dua, Director & Practice Leader (Technology, Media & Telecom), CRISIL Global Research & Analytics
Big data is a term that describes the large volume of data may be both structured and unstructured.
That inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.
This Presentation is completely on Big Data Analytics and Explaining in detail with its 3 Key Characteristics including Why and Where this can be used and how it's evaluated and what kind of tools that we use to store data and how it's impacted on IT Industry with some Applications and Risk Factors
BIG DATA
Prepared By
Muhammad Abrar Uddin
Introduction
· Big Data may well be the Next Big Thing in the IT world.
· Big data burst upon the scene in the first decade of the 21st century.
· The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
· Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings.
What is BIG DATA?
· ‘Big Data’ is similar to ‘small data’, but bigger in
size
· but having data bigger it requires different approaches:
– Techniques, tools and architecture
· an aim to solve new problems or old problems in a better way
· Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.
What is BIG DATA
· Walmart handles more than 1 million customer transactions every hour.
· Facebook handles 40 billion photos from its user base.
· Decoding the human genome originally took 10years to process; now it can be achieved in one week.
Three Characteristics of Big Data V3s
(
Volume
Data
quantity
) (
Velocity
Data
Speed
) (
Variety
Data
Types
)
1st Character of Big Data
Volume
· A typical PC might have had 10 gigabytes of storage in 2000.
· Today, Facebook ingests 500 terabytes of new data every day.
· Boeing 737 will generate 240 terabytes of flight data during a single
flight across the US.
· The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
2nd Character of Big Data
Velocity
· Clickstreams and ad impressions capture user behavior at millions of events per second
· high-frequency stock trading algorithms reflect market changes within microseconds
· machine to machine processes exchange data between billions of devices
· infrastructure and sensors generate massive log data in real- time
· on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
3rd Character of Big Data
Variety
· Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media.
· Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure.
· Big Data analysis includes different types of data
Storing Big Data
· Analyzing your data characteristics
· Selecting data sources for analysis
· Eliminating redundant data
· Establishing the role of NoSQL
· Overview of Big Data stores
· Data models: key value, graph, document, column-family
· Hadoop Distributed File System
· H.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris
Content
1. Introduction
2. What is Big Data
3. Characteristic of Big Data
4. Storing,selecting and processing of Big Data
5. Why Big Data
6. How it is Different
7. Big Data sources
8. Tools used in Big Data
9. Application of Big Data
10. Risks of Big Data
11. Benefits of Big Data
12. How Big Data Impact on IT
13. Future of Big Data
Introduction
• Big Data may well be the Next Big Thing in the IT
world.
• Big data burst upon the scene in the first decade of the
21st century.
• The first organizations to embrace it were online and
startup firms. Firms like Google, eBay, LinkedIn, and
Facebook were built around big data from the
beginning.
• Like many new information technologies, big data can
bring about dramatic cost reductions, substantial
improvements in the time required to perform a
computing task, or new product and service offerings.
• ‘Big Data’ is similar to ‘small data’, but bigger in
size
• but having data bigger it requires different
approaches:
– Techniques, tools and architecture
• an aim to solve new problems or old problems in a
better way
• Big Data generates value from the storage and
processing of very large quantities of digital
information that cannot be analyzed with
traditional computing techniques.
What is BIG DATA?
What is BIG DATA
• Walmart handles more than 1 million customer
transactions every hour.
• Facebook handles 40 billion photos from its user base.
• Decoding the human genome originally took 10years to
process; now it can be achieved in one week.
Three Characteristics of Big Data V3s
Volume
• Data
quantity
Velocity
• Data
Speed
Variety
• Data
Types
1st Character of Big Data
Volume
•A typical PC might have had 10 gigabytes of storage in 2000.
•Today, Facebook ingests 500 terabytes of new data every day.
•Boeing 737 will generate 240 terabytes of flight data during a single
flight across the US.
• The smart phones, the data they create and consume; sensors
embedded into everyday objects will soon result in billions of new,
constantly-updated data feeds containing environmental, location,
and other information, including video.
2nd Character of Big Data
Velocity
• Clickstreams and ad impressions capture user behavior at
millions of events per second
• high-frequency stock trading algorithms reflect market
changes within microseconds
• machine to machine processes exchange data between
billions of devices
• infrastructure and sensors generate massive log data in real-
time
• on-line gaming systems support millions of concurrent
users, each producing multiple inputs per second.
3rd Character of Big Data
Variety
• Big Data isn't just numbers, dates, and strings. Big
Data is also geospatial data, 3D data, audio and
video, and unstructured text, including log files and
social media.
• Traditional database systems were designed to
address smaller volumes of structured data, fewer
updates or a predictable, consistent data stru.
seminar on Big Data Technology
report on big data technology
webinar on big data technology
topic on big data technology
ppt presentation on big data technology
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
3. 3
Introduction
In 2005, Mark Kryder observed that
magnetic disk storage capacity was
increasing very quickly, “Inside of a
decade and a half, hard disks had
increased their capacity 1,000-fold.”
Intel founder Gordon Moore
called this rate of increase
"flabbergasting.”
4. 4
Being Data-driven
● Companies have taken advantage of this ability to store and quickly access massive
amounts data—so much so that every day across the globe we create 2.5 quintillion
bytes of data (2.5 exabytes).*
● Data-driven companies have demanded new data-processing and analysis techniques
that can scale to handle very large computing workloads.
● This ensuing explosion in the amount and variety of data available, and the challenges
of processing and analyzing it, led to the concept of big data.
*IBM
5. 5
A few Big Data Success Stories
● PredPol Inc., the Los Angeles and Santa Cruz police departments, and a team of
educators created software to analyze crime data, and predict where crimes are
likely to occur down to 500 square feet. In areas in LA where the software is being
used, there's been a 33% reduction in burglaries and a 21% reduction in violent
crimes.
● The Tesco supermarket chain collected 70 million data points (such as energy
consumption) from its refrigerators and learned to predict when the refrigerators
need servicing to cut down on energy costs.
Source: searchcio.techtarget.com
6. 6
3 Key Things Driving The Growth of Big Data
1. People
● Using mobile phones, the Internet, and a variety of other things, billions of people are creating and
consuming information faster than ever before in history.
2. Organizations
● Some companies have established dominant positions as leaders in their markets by successfully
mastering a variety of complex data types and tools to run operations and derive business
intelligence insights.
● Most companies are not equipped to handle the vast amount of data available.
3. Sensors and beacons
● A sensor detects changes in its environment and converts this to information. A common example
is a motion detector.
● A beacon gives off a signal that’s detected by a sensor. One example is a Bluetooth® beacon.
● These devices have become smaller, cheaper, and more prevalent, and they generate mountains of
data.
7. 7
Big data is A Broad Term.
Terabytes (TBs) or petabytes (PBs) of data are usually considered big data, but a 100-
gigabyte (GB) relational database could also be a big data problem.
If you have GBs of data per second coming in that you need to process and store, you
have a big data problem.
Even if you only have a moderate amount of data—if you have to repeatedly process and
analyze it, you might have a big data problem.
8. 8
What Makes Data Big?
Big data describes situations that arise when your datasets become so large that
traditional tools, such as relational databases, can no longer adequately process data.
This could be because of:
● Volume: Your dataset is so large that it no longer fits on a single computer or
relational database.
● Velocity: Data comes in rapidly or changes so often that you can’t process it fast
enough for it to be useful.
● Variety: Data comes from a variety of sources and in different formats, which
require different types of processing.
10. 10
Other factors Impacting Big Data Management
● Value: Extracting insight from large datasets.
● Valence: The ease with which data can be moved from one storage
system to another.
● Veracity: Maintaining data integrity and accuracy.
● Viscosity: The ease with which data can be combined with other
data and made more valuable.
11. 11
Impact of Big Data
Big data issues impact all phases of data handling, including:
● Monitoring
● Collection
● Storage
● Processing
● Analysis
● Reporting
This greatly complicates the information technology (IT) job, demanding more expertise
from IT professionals.
12. 12
Trends in Big Data
Analysts agree that the amount of data generated every year will continue to grow
massively for the foreseeable future. This will create new opportunities to
capitalize on business insights gathered from data.
It’s likely that the variety of sources of data will continue to grow in number.
Adoption of cloud computing will continue to increase as it becomes increasingly
cheaper and easier to use cloud tools. In contrast, on-premise systems are not
likely to become significantly easier to set up and use.
13. 13
Big Data Market
International Data Corporation forecasts that the big data technology and services market
will grow about 23% per year, with annual spending reaching $48.6 billion in 2019.
There are many companies offering services in different areas in the big data industry.
Review this overview of big data vendors and technologies provided by Capgemini.
14. 14
Big Data Complexity Creates IT Opportunities
Data sources Ingest Process Store Analyze Visualize
Real-time processing and analytics
Flat files
EDW
Analytical (OLAP) systems
Stream computing
Operational
systems (OLTP)
Actionable
insights
Reporting
Discovery &
exploration
Modeling &
predictive
analytics
Dashboards
Transaction data
(OLTP)
Traditional data
Application data
(ERP, CRM)
Third-party data
New data sources
Machine data
Docs, emails
Social data
Sensor data
Weblogs,
clickstream data
Images, videos
Data
replication
NoSQL DBs
Staging | Exploration |
Archiving
Transformations
Transformation
Load
Data integration (ETL)
Data quality
Data prep
Data marts
Data Mgmt Governance Security
Operational DBs
ERP, CRM DBs
Advanced
analytics
Data ingest
apps
15. 15
Big Data has Been Inaccessible to Most Businesses
● Big data is difficult: It requires experts to manage a complex, distributed
computing infrastructure. These specialists are expensive and difficult to hire, and
the work takes a lot of time.
● Big data is expensive: Costs tend to grow with the volume, velocity, and variety of
data. And computing resources must be provisioned for peak demand. That means
you might have to purchase more computing resources than you need most of the
time.
16. Confidential & ProprietaryGoogle Cloud Platform 16
Complexities of Big Data Processing
Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing scale
Utilization
improvements
Typical big data processing
tasks
17. Confidential & ProprietaryGoogle Cloud Platform 17
Big Data Processing on A Cloud Platform
Programming
Focus on insight,
not infrastructure
18. 18
Recast Big Data Problems as Data Science Opportunities
Your role in sales is to:
● Help identify and scope the big data problem
● Help your customer see it as solvable with data science
● Help your customer see an opportunity to get an advantage over the competition
20. Data Engineering Data Science
Data Reference Architecture
Cloud Pub/Sub
Asynchronous messaging
Cloud Storage
Raw log storage
Cloud Dataflow
Parallel data processing
BigQuery
Analytics Engine
Cloud Machine Learning
Train Models
Batch Pipeline
21. Enable object lifecycle management across classes
Single API across all storage classes
ms time to first byte for every class
Cloud Storage
Performant, unified, and cost-effective object storage
22. Open APIs
Global, fully-managed event delivery
Integrated with Cloud Dataflow for
stream processing
Cloud Pub/Sub - Scalable Event Ingestion and Delivery
23. Scale from GB to PB with zero operations
Fully Managed SQL Data Warehouse
OLAP Analytics Engine
BigQuery
24. Proprietary + Confidential
Bigtable: Fully Managed NoSQL Database
Supports open source HBase API and
integrates with GCP data solutions
Fully managed NoSQL, wide column
database for TB to PB datasets
Single indexed schema for thousands
of columns, millions of rows
Low latency and high throughput, millions
of operations per second
25. BigQuery
engine
BigQuery
Abuse detection
User interactions
Streaming Batch
User engagement analytics
Cloud Pub/Sub
ACL ACLTopic 2
Business
dashboard
Data
science
tools
Users
Devs
Data
scientists
Business
App events ACL ACLTopic 1
Storage Services
Cloud
Storage
Cloud
Datastore
Cloud
SQL
Open Source
orchestration
Connectors
Cloud Dataflow
26. 26
Big Data Use Cases
To recap, the concept of big data covers a lot of ground and generally refers to the
collection, storage, processing, analysis, and visualization of very large and very fast-
moving datasets. Big data use cases span every industry as businesses increasingly
look to differentiate their offerings by extracting insight from the data in their business.
The following slides describe some popular big data use cases.
27. 27
Use Case: Extract, Transform, and
Load
Whenever you’re managing a massive amount of data, you’re
going to need to:
1. Extract a lot of raw data from disparate sources.
2. Transform that data into a form that can be used for your
business operations or analysis, perhaps by aggregating
or cleansing it.
3. Load that data into your data warehouse so you can use
it.
ETL is a process that generally refers to moving data.
Sometimes people use what's called ELT, where they load the
unprepared data into a data warehouse and then prepare it
there. It's an alternative to ETL.
28. 28
Use Case: 360-degree Customer View
A 360-degree customer view is the attempt to get a
complete view of customers by combining data from
various touch points, such as marketing and the
purchasing process. Businesses use a 360-degree
customer view to drive better engagement, more revenue,
and long-term loyalty. It’s used by:
● Financial service businesses to determine the best
financial packages—insurance, investments, and so
on—to sell to specific customers.
● Retail businesses to determine the best times to
make special offers to maximize sales.
● Enterprise businesses to determine customer
retention and upsell strategies.
29. 29
Use Case: Fraud Detection
Fraud detection is the process of identifying anomalies in patterns of behavior that signal
potential fraud. Today, fraud detection can involve analyzing large volumes of data, such as:
● Transactions
● Authorization information
● Buying patterns
For example, it’s used by:
● Credit card companies to prevent unauthorized purchases that don’t match a
customer’s profile.
● Financial service businesses to prevent illegal financial transactions.
● Technology businesses to prevent unauthorized access to products and services, such
as email.
30. 30
Uses Case: Saving Lives
● Sequencing a human genome—all 3 billion “letters” that denote an individual's
unique DNA sequence—is providing information that’s improving scientists'
understanding of the genetic basis of many human diseases.
● Other large-scale projects, such as the 100,000 Genomes Project, are starting to
give some families a diagnosis for a child’s mysterious condition. Participants give
consent for their genome data to be linked to information about their medical
condition and health records. The medical and genomic data is shared with
researchers to improve knowledge of the causes, treatment, and care of diseases.
33. 33
● Node: Usually a device on a network. A node on the Internet is anything that has an
IP address.
● Distributed processing: The method of spreading data-processing capabilities
across a set of networked computers.
● Batch processing: Processing of sets of data instead of single units to maximize
efficiency.
● Stream processing: Continuous and automatic processing of data as it’s captured,
in order to generate systematic output.
● Massively parallel processing (MPP): The use of a large number of distributed
computers to perform a set of coordinated computations in parallel
(simultaneously).
Common Big Data Terms- I
34. 34
Data collection: The process of gathering data for the purposes of analysis and
evaluation from a variety of sources which can be structured or unstructured in format.
Also called data capture.
Data aggregation: The process of compiling of information from multiple databases to
create a combined dataset, usually for data processing, reporting, or analysis.
Data pipeline: Executable code defining a set of data-processing steps for transforming
data.
Machine data: Records the activity and behavior of customers, users, transactions,
applications, servers, networks and mobile devices. It includes configurations, data from
APIs, message queues, change events, the output of diagnostic commands, call detail
records, sensor data from industrial systems, and more.
Common Big Data Terms- II
35. 35
Data Science: Data Science is the field of study of where information comes from, what
it represents, and how it can be turned into valuable insights.
Data lake: A storage repository that holds a vast amount of raw data in its native format,
including structured, semistructured, and unstructured data. Data is extracted from a
data lake as needed and transformed into the format used in downline processing.
Data monitoring: A business practice in which critical business data is routinely checked
against quality control rules to make sure it is always high quality and meets previously
established standards for formatting and consistency.
Data warehouse: A system used for reporting and data analysis which are central
repositories of integrated data from one or more disparate sources.
Vanilla: Refers to an installation that is straight from the source, contains no
customization, and isn’t distributed by a third party.
Common Big Data Terms- III
37. 37
Common Data Analytics Terms- I
Data mart: The part of a data warehouse that’s used to get data out to users which is
usually oriented to a specific business line or team.
Statistical computing: The interface between statistics and computer science. It’s the
area of computational science (or scientific computing) specific to the mathematical
science of statistics.
Web, mobile, and commerce analytics: The measurement, collection, analysis, and
reporting of web, mobile, or commerce data for purposes of understanding and
optimizing usage.
Online analytical processing (OLAP): An approach to answering analytical queries
swiftly as part of the broader category of business intelligence. Typical applications of
OLAP include business reporting for sales, marketing, management reporting, business
process management, budgeting and forecasting, financial reporting, and similar areas.
38. 38
Common Data Analytics Terms- II
Statistical computing: The interface between statistics and computer science. It’s the
area of computational science (or scientific computing) specific to the mathematical
science of statistics.
Real time: Means that there is near zero latency and access to data information
whenever it is required. This leads to business insights being understood in real time
versus after an event has taken place. Analytics processing jobs used to take hours or
days, often rendering critical business information no longer useful.
Speech and vision recognition, and natural language processing: 3 core areas of
machine learning that rely on huge amounts of training data and must process large
amounts of data in real time.
39. 39
Test Yourself: Can You Define Everything Shown?
Data sources Ingest Process Store Analyze Visualize
Real-time processing and analytics
Flat files
EDW
Analytical (OLAP) systems
Stream computing
Operational
systems (OLTP)
Actionable
Insights
Reporting
Discovery &
exploration
Modeling &
predictive
analytics
Dashboards
Transaction data
(OLTP)
Traditional data
Application data
(ERP, CRM)
Third-party data
New data sources
Machine data
Docs, emails
Social data
Sensor data
Weblogs,
clickstream data
Images, videos
Data
replication
NoSQL DBs
Staging | Exploration |
Archiving
Transformations
Transformation
Load
Data Integration (ETL)
Data quality
Data prep
Data marts
Data mgmt Governance Security
Operational DBs
ERP, CRM DBs
Advanced
analytics
Data ingest
apps
40. 40
Additional Resources
● Big data assets on the partner portal
● Google Cloud Platform big data one pager
● Big Data and the Creative Destruction of Today’s Business Models
● Public data sets for use by anyone for analyzing problems
● Video: What is Big Data? Can it help us solve some of society’s big challenges?
● Video: Deep Learning: Intelligence from Big Data
● Online Harvard course on Data Science
● Interesting big data infographic
● How big data is changing the database landscape