SlideShare a Scribd company logo
1 of 43
Week 1
PART 1
Modern Data Ecosystem
“The constant increase in data
processing speeds and
bandwidth, the nonstop
invention of new tools for
creating, sharing, and consuming
data, and the steady addition of
new data creators and consumers
around the world, ensure that
data growth continues
unabated. Data begets more data
in a constant virtuous cycle.”
Forbes 2020 report on data in the coming decade
ENTERPRISE DATA ENVIOREMENT
• Data integrated from disparate sources
• Different types of analysis and skills to generate
insights
• Active stakeholders to collaborate and act on
insights generated
• Tools, applications, and infrastructure to store,
process, and disseminate data as required.
INTERCONNECTED INDEPENDENT CONTINUALLY EVOLVING
S 1 1 1
Data sources
text, images, videos,
clickstreams, user
conversations,
social media
platforms, the
Internet of Things (or
IoT) devices, real-
time events that
stream data, legacy
databases, and data
sourced from
professional data
providers and
agencies
Data is available
in a variety of
structured and
unstructured
datasets.
Pull a copy of the data from the
original sources into a data repository.
At this stage, you’re only looking at acquiring the data you
need—working with data formats, sources, and interfaces
through which this data can be pulled in.
Challenges!
• Reliability
• Security
• Integrity
1 2 3
Organized Cleaned up
Optimized for
access by end
users
The data will need to conform to compliances and
standards enforced in the organization. For example:
• Conforming to guidelines that regulate the storage and
use of personal data (health, biometrics, or household
data IOT)
• Master data tables within the organization, to ensure
standardization of master data across all applications
and systems.
Challenges!
• Data management
• Data repositories
that provide
high availability,
flexibility,
accessibility,
and security.
Data management
Enterprise data
repositories
1 2 3
Data management
Enterprise data
repositories
Business Stakeholders
APPs
Programmers Analysts
Data science use cases
Users
Challenges!
Interfaces, APIs,
and applications
that can get this
data to the end-
users in line with
their specific
needs.
For example, Data Analysts may need the raw data to
work with, business stakeholders may need reports
and dashboards, applications may need custom APIs
to pull this data.
1 2 3
Emerging
technologies
Machine
Learning
Data Scientists are creating
predictive models by training
machine learning algorithms
on past data.
Cloud
computing
Every enterprise today has
access to limitless storage, high
performance computing, open
source technologies, machine
learning technologies, and the
latest tools and libraries.
Big data
Is paving the way for new tools
and techniques and also new
knowledge and insights.
Key Players in the Data
Ecosystem
Organizations that are using data
to uncover opportunities and are
applying that knowledge to
differentiate themselves are the
ones leading into the future.
• Identifying patterns in financial transactions
to detect fraud
• Using recommendation engines to drive
conversion
• Mining social media posts for customer
voice, or brands
• Personalizing their offers based on
customer behavior analysis, business
leaders realize that data holds the key to
competitive advantage.
S 1 1 2
Data Engineers, Data Analysts, Data Scientists, Business
Analysts, and Business Intelligence (or BI) Analysts
Data Engineers
Data Analysts
Data Scientists
Business Analysts
Business Intelligence (or BI) Analysts
Data
Professionals
Summary
• Data Engineering converts raw data into usable
data.
• Data Analytics uses this data to generate insights.
• Data Scientists use Data Analytics and Data
Engineering to predict the future using data from
the past.
• Business Analysts and Business Intelligence
Analysts use these insights and predictions to drive
decisions that benefit and grow their business.
Functions:
• Develop and maintain
data architectures
• Make data available for
business operations and
analysis
Data Engineers
Data Engineers work within the data ecosystem
to:
• Extract, integrate, and organize data
from disparate sources
• Clean, transform, and prepare data
• Design, store, and manage data in
data repositories.
They enable data to be
accessible in formats and
systems that the various
business applications, as
well as stakeholders like
Data Analysts and Data
Scientists, can utilize.
Skills:
• Good knowledge of programming
• Sound knowledge of systems and technology
architectures
• In-depth understanding of relational
databases and non-relational datastores.
Functions:
Translates data and numbers
into plain language, so
organizations can make
decisions.
Responsabilities:
• Inspect, and clean data for deriving insights;
• Identify correlations, find patterns, and apply
statistical methods to analyze and
mine data;
• Visualize data to interpret and present the
findings of data analysis.
“Are the users’ search
experiences generally good or
bad with the search
functionality on our site”
“What is the popular
perception of people regarding
our rebranding initiatives”
“Is there a correlation between
sales of one product and
another."
Skills:
• Good knowledge of spreadsheets, writing
queries, and using statistical tools to create
charts and dashboards.
• Modern data analysts also need to have some
programming skills.
• They need strong analytical and story-telling
skills.
Data Analysts
Responsabilities:
• Analyze data for actionable insights
• Build Machine Learning or Deep Learning
models that train on past data to create
predictive models
“How many new social
media followers am I
likely to get next month?”
“What percentage of my
customers am I likely
to lose to competition in
the next quarter”
“Is this financial
transaction unusual for
this customer?”.
Skills:
• Knowledge of Mathematics, Statistics
• Understanding of programming languages,
databases, and building data models.
• They also need to have domain knowledge.
Data Scientists
Responsabilities:
Leverage the work of Data
Analysts and Data Scientists to
look at possible implications
their business and the actions
they need to take or
recommend.
Business Analysts Business Intelligence Analysts
Responsabilities:
• Focus on market forces and
external influences that shape
their business.
• Organize and monitor data on
different business functions
• Explore that data to extract
insights and actionables that
improve business performance.
What is Data Engineering?
S 1 1 3
• The field of Data Engineering concerns itself with the mechanics for the flow and
access of data
• Its goal is to make quality data available for fact-finding and data-driven
decision making
• The world of data has grown to include wide-ranging sources, structures, and
types of data.
Collecting
source data
Processing
data
Storing data
Making data
available to
users securely
The field of Data Engineering involves:
Develop tools, workflows, and processes that help you acquire data from
multiple sources.
• Design, build, and maintain scalable data architecture to store data.
• Data could be stored in databases, data warehouses, data lakes, or any
other type of data repository.
Collecting
source data
Processing
data
Storing data
Making data
available to
users securely
The field of Data Engineering involves:
Processing data includes cleaning, transforming, and preparing data so that
it is usable.
• Implement and maintain distributed systems for large-scale processing of
data.
• Design pipelines for the extraction, transformation, and loading of data
into data repositories.
• Design or implement solutions for validating and safeguarding quality,
privacy, and security
• of data.
• Optimize tools, systems, and workflows for performance, reliability, and
scalability.
• Ensure data meets all regulatory and compliance guidelines.
Collecting
source data
Processing
data
Storing data
Making data
available to
users securely
The field of Data Engineering involves:
• Architect or implement data stores for the storage of processed data.
• Ensure systems are scalable, keeping in mind the evolving nature of data
and business needs.
• Ensure tools and systems are in place that take care of data privacy,
security, compliance,
• monitoring, backup, and recovery.
Collecting
source data
Processing
data
Storing data
Making data
available to
users securely
The field of Data Engineering involves:
• APIs, services, and programs that retrieve data on defined parameters for
use by end-users.
• Interfaces and dashboards that present data to users so they can derive
insights from the data.
• Ensure the right measures and checks and balances are in place to keep
data secure and provide rights-based access to users.
Architect:
Skills: To architect
any data
management
system, be it for
collating source
data or storing
processed
analysis-ready
data
To ensure data
stores are available
and optimized for
use, you need to
have expertise in
databases.
Proficiency in
database tools,
programming
languages, and
distributed
systems all
come under
data
engineering, but
they may
require different
skill sets.
• More than any other data profession, data
engineering is about the tools and
technologies involved in data manipulation.
• But it is also about understanding the complexities
of data and how it is ultimately leveraged for fact-
finding and decision-making.
Summary
S 1 1 4
Viewpoints: Defining Data
Engineering
How do you define data engineering?
Data engineering
involves
•Designing
•Building
•Maintaining data
infrastructures
and platforms
Data infrastructures
include
•Databases
•Big Data
Repositories
• DataPipelines for
transforming and
moving data
between data
systems
Data Engineers
•Develop and
optimize data
systems
•Make data
available for
analysis
Data Analysts
•Analyze data in
data systems to
report and derive
insights
Data Scientists
•Perform deeper
analysis on data
•Develop
predective models
to solve more
complex data
problems
How do you define data engineering?
Data Engineers ensure
• Data is highly
available
• Consistent
• Secure
• Recoverable
Data Analysts and
Data Scientists
• Make use of the
data that Data
Engineers provide
Data Engineers work
• Ensure data matches
their needs
How do you define data engineering?
Data Engineers involves
•Designing
•Maintaining
•Optimizing data systems
•Selecting the right databases, storage systems and
cloud architecture
•Ensuring data flow inside an organization is
seamless
•Ensuring data can be delivered to stakeholders
when they need it
Data Analysts and Data Scientists
•Make use of data provided by Data Engineers for
purposes of análisis and predictions
How do you define data engineering?
How do you define data engineering?
Data Engineers
• Extract and collect raw data
from multiple sources
• Transform and store data in
table form
Data Analysts and Data
Scientists
• Perform analysis on data
prepared by Data Engineers
How do you define data engineering?
How do you define data engineering?
Data Engineers
• Precursor to Data
Analysis and Data
Science
Data Engineers enable
Data Analysts and
Data Scientists
• Picking the right
databases and tools
• Building the right
data pipelines
Data Analysts and
Data Scientists use
this data to
• Build reports
• Perform statistical
analysis
How do you define data engineering?
How do you define data engineering?
S 1 1 5
Viewpoints: Evolution of
Data Engineering
Can you talk about how data
engineering has evolved over the
past couple of decades?
• Data engineering landscape is very different
• The quantity of data that is handled was
unthinkable
• Variety and formats of data available have
evolved
• Big Data was unheard of
• Expectations have also evolved
• Today, there is widespread use of
automation tools
Can you talk about how data
engineering has evolved over the
past couple of decades?
• With cloud computing, data infrastructure is
now available as a service
• A DE, today, needs to do a lot les from
scratch and spend les time on setting up and
managing systems
• Before: Only relational databases and data
warehouses. Today: NoSQL and other types
of data repositories
• DE need to be able to work with Big Data
systems and pipelines
• DE need to know a variety of tolos, data
systems and how to suse them for solving
data rpoblems
Can you talk about how data
engineering has evolved over the
past couple of decades?
• Before: The approach hierarchical – Enterprise Architects of Data Architects made
decisions regarding data storage
• DE work closely with developers to see how best to meet their requirements
• Availability of different types of data sources, such as, the IoT and API feeds
• Variety of databases:
• Wide column stores (Cassandra and Hbase)
• Key-value pair databases
• Hadoop for Big Data
• Doc stores such as MongoDB and Coachbase
• Evolution of data sources and the variety of data have contributed
• Before: Focused on database management, ETL pipelines and data visualization
• Now: Distributed computing, DevOps and Implementation of Machine Learnign models
Week 1
PART 2
Responsabilities and Skillsets of a
Data Engineer
At a broad level, Data Engineers:
• Extract, organize, and integrate data from disparate sources
• Prepare data for analysis and reporting by transforming and
cleansing it
• Design and manage data pipelines that encompass the
journey of data from source to destination systems
• Setup and manage the infrastructure required for the
ingestion, processing, and storage of data: Data Platforms,
Data Stores, Distributed systems, Data Repositories
S 1 2 1
The overarching responsibility of a data engineer is to
provide analytics-ready data to data consumers
Data is
analytics-ready
when:
• Accurate
• Reliable
• Complies to
regulations it
is governed
• Accessible to
consumers
when they
need it.
Technical Skills
administrative tools, system utilities and
commands
load balancing and application performance
monitoring
Experience of working with ETL tools such as IBM
Infosphere Information Server, AWS Glue, and
Improvado.
Functional Skills
• The ability to convert business requirements into technical specifications.
• The ability to work with the complete software development lifecycle which includes
ideation, architecture, design, prototyping, testing, deployment, and monitoring.
• An understanding of data’s potential application in business.
• And an understanding of the risks of poor data management which essentially covers
data quality, privacy, security, and compliance.
software
engineering
data science
DATA
ENGI
NEER
ING
• Data engineering is a team sport.
• Interpersonal skills, teamwork, and collaboration
• As a data engineer, you need to be able to
communicate effectively with both technical and
non-technical stakeholders in a manner that a clear
understanding can be established.
Soft Skills
DATA
ENGINEER
ANALYSTS
DATA
SCIENTISTS
BUSINESS
USERS
TECHNICAL
TEAMS
S 1 2 2
Viewpoints: Skills and
Qualities to be a Data
Engineer
What are some of the technical skills
required to be a successful Data
Engineer?
Data engineering
is
• A fast-moving
technology
Data Engineers
need
• To love data
Technical skills
depend
• On the job role
Experience
required in Retail
• Relational
databases
• Cassandra
• Google
Bigtable
• Kafka Stream
• WebSphere
MQ
In Healthcare or
Socical Media
• Need of
different
skillsets
What are some of the technical skills
required to be a successful Data
Engineer?
• Acquiring technical skills is the easier part of
the overall skills required
• Important to understant the basic theory
about data structures and how to work with
data
• Willing to roll with change
What are some of the technical skills
required to be a successful Data
Engineer?
• Good understanding of networking, both
LAN and WAN
• Good understanding of different types of
storage, local as well as cloud-based
• Very Good knowledge of operating systems
• Expertise in databases, both RDBMS and
NoSQL
• Programming skills in any language such as
Java, C, or Python
• Automation skills
Knowledge of
•One or more
programming
languages and
operating systems
Knowledge of IT
infrastructure and
landscape:
•Computer
architecture
•Cloud
•Virtual machines
•Different types of
storage
Proficiency in SQL Working knowledge of
•One or more
databases, both
relational and NoSQL
Knowledge and
experience of:
•ETL
•Data Pipelines
•Data Warehouses
•Data Lakes
•Other Big Data
systems
What are some of the technical skills required
to be a successful Data Engineer?
• Curiosity
• Questioning skills – asking questions to business and
technical users
• Enjoy problema-solving and troubleshooting
• Being Good at teamwork, collaboration, and
communication
• Being Good at logic and having an interest in coding
What are some of the qualities and soft skills
required to be a successful Data Engineer?
• Good problem solving and communication skills
• Detail-oriented control-freak
• There is a lot of detail involved in working with data
• Its important to get into the details and make sure that
you don´t leave any of the boxes unchecked
• Being in control and aware of the choices that are being
made in the data environment, and their trade-offs
• Interact with developers
• Defend your choices to management
• Advocate for the choices you make
• Good work ethic
• Passion to learn
What are some of the qualities and soft skills
required to be a successful Data Engineer?
What are some of the qualities and soft skills
required to be a successful Data Engineer?
What are some of the qualities and soft skills
required to be a successful Data Engineer?
• Curiosity
• Good communication
• Eagerness to learn
What are some of the qualities and soft skills
required to be a successful Data Engineer?
S 1 2 3
Viewpoints: A Day in the
Life of a Data Engineer
The need
Launching a new shampoo in the
market
How a product gets talked about on social media has an immediate impact on sales numbers
and brand perception (customer sentiment).
They want to keep a close eye on all social media and know what online communities say about
their product—positive feedback, negative comments, suggestions, even comparisons with
their current favorites.
The Goal • Prototype of a dashboard using a
sentiment analysis algorithm and
dummy data to serve this goal.
• The dashboard displayed graphs of
different customer sentiments
plotted as scores across social media
sources and consumer
demographics.
• The idea was an instant hit and got a
go-ahead for implementation by the
business team.
• This is where our team of data
engineers stepped in—to make this
prototype a reality.
The Solution
My first task was to pull in data from all the social
media sites and online sources identified by the
business teams into the organization’s
environment.
I started with APIs to pull the tweets and posts with
our product’s hashtag into temporary storage. I
then turned to the eCommerce portals and product
review blogs to collect our product-specific data
and move that into temporary storage as well—an
activity known as web scraping. Tweets, posts,
comments, articles, even memes—there was quite a
mix of formats and structures.
Once I had all the data I needed, I inspected it to
assess the transformations that will need to be
performed on this data before it can be loaded into
the database.
We decided to create a Python program for
processing the data and loading it into
the database. I cleaned up the data and
transformed it into a format that would be stored in
the database. This is the database from which the
dashboard is pulling the data to display the reports.
Next Steps
Build a data pipeline that extracts, transforms, and loads the data on an
ongoing basis.
Once this process is in place, business teams will be able to see a real-time
projection each time they log into the dashboard.

More Related Content

Similar to semana1.pptx

Data Analytics Course In Delhi-November
Data Analytics Course In Delhi-NovemberData Analytics Course In Delhi-November
Data Analytics Course In Delhi-NovemberDataMites
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSwapnilSaurav10
 
Data Analytics Course In Pune-October
Data Analytics Course In Pune-OctoberData Analytics Course In Pune-October
Data Analytics Course In Pune-OctoberDataMites
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfJerichoGerance
 
Data Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-OctoberData Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-OctoberDataMites
 
Data Analytics Course In Mumbai-November
Data Analytics Course In Mumbai-NovemberData Analytics Course In Mumbai-November
Data Analytics Course In Mumbai-NovemberDataMites
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxRupaRani28
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxYashiBatra1
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiProfessor Lili Saghafi
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoTShivam Singh
 
Data Analytics Course In Chennai-November
Data Analytics Course In Chennai-NovemberData Analytics Course In Chennai-November
Data Analytics Course In Chennai-NovemberDataMites
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 

Similar to semana1.pptx (20)

Data Analytics Course In Delhi-November
Data Analytics Course In Delhi-NovemberData Analytics Course In Delhi-November
Data Analytics Course In Delhi-November
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Analytics Course In Pune-October
Data Analytics Course In Pune-OctoberData Analytics Course In Pune-October
Data Analytics Course In Pune-October
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
Data science
Data scienceData science
Data science
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-OctoberData Analytics Course In Hyderabad-October
Data Analytics Course In Hyderabad-October
 
Data Analytics Course In Mumbai-November
Data Analytics Course In Mumbai-NovemberData Analytics Course In Mumbai-November
Data Analytics Course In Mumbai-November
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Big data
Big dataBig data
Big data
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptx
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
Data Analytics Course In Chennai-November
Data Analytics Course In Chennai-NovemberData Analytics Course In Chennai-November
Data Analytics Course In Chennai-November
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 

Recently uploaded

What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 

Recently uploaded (20)

What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

semana1.pptx

  • 2. Modern Data Ecosystem “The constant increase in data processing speeds and bandwidth, the nonstop invention of new tools for creating, sharing, and consuming data, and the steady addition of new data creators and consumers around the world, ensure that data growth continues unabated. Data begets more data in a constant virtuous cycle.” Forbes 2020 report on data in the coming decade ENTERPRISE DATA ENVIOREMENT • Data integrated from disparate sources • Different types of analysis and skills to generate insights • Active stakeholders to collaborate and act on insights generated • Tools, applications, and infrastructure to store, process, and disseminate data as required. INTERCONNECTED INDEPENDENT CONTINUALLY EVOLVING S 1 1 1
  • 3. Data sources text, images, videos, clickstreams, user conversations, social media platforms, the Internet of Things (or IoT) devices, real- time events that stream data, legacy databases, and data sourced from professional data providers and agencies Data is available in a variety of structured and unstructured datasets. Pull a copy of the data from the original sources into a data repository. At this stage, you’re only looking at acquiring the data you need—working with data formats, sources, and interfaces through which this data can be pulled in. Challenges! • Reliability • Security • Integrity 1 2 3
  • 4. Organized Cleaned up Optimized for access by end users The data will need to conform to compliances and standards enforced in the organization. For example: • Conforming to guidelines that regulate the storage and use of personal data (health, biometrics, or household data IOT) • Master data tables within the organization, to ensure standardization of master data across all applications and systems. Challenges! • Data management • Data repositories that provide high availability, flexibility, accessibility, and security. Data management Enterprise data repositories 1 2 3
  • 5. Data management Enterprise data repositories Business Stakeholders APPs Programmers Analysts Data science use cases Users Challenges! Interfaces, APIs, and applications that can get this data to the end- users in line with their specific needs. For example, Data Analysts may need the raw data to work with, business stakeholders may need reports and dashboards, applications may need custom APIs to pull this data. 1 2 3
  • 6. Emerging technologies Machine Learning Data Scientists are creating predictive models by training machine learning algorithms on past data. Cloud computing Every enterprise today has access to limitless storage, high performance computing, open source technologies, machine learning technologies, and the latest tools and libraries. Big data Is paving the way for new tools and techniques and also new knowledge and insights.
  • 7. Key Players in the Data Ecosystem Organizations that are using data to uncover opportunities and are applying that knowledge to differentiate themselves are the ones leading into the future. • Identifying patterns in financial transactions to detect fraud • Using recommendation engines to drive conversion • Mining social media posts for customer voice, or brands • Personalizing their offers based on customer behavior analysis, business leaders realize that data holds the key to competitive advantage. S 1 1 2 Data Engineers, Data Analysts, Data Scientists, Business Analysts, and Business Intelligence (or BI) Analysts
  • 8. Data Engineers Data Analysts Data Scientists Business Analysts Business Intelligence (or BI) Analysts Data Professionals
  • 9. Summary • Data Engineering converts raw data into usable data. • Data Analytics uses this data to generate insights. • Data Scientists use Data Analytics and Data Engineering to predict the future using data from the past. • Business Analysts and Business Intelligence Analysts use these insights and predictions to drive decisions that benefit and grow their business.
  • 10. Functions: • Develop and maintain data architectures • Make data available for business operations and analysis Data Engineers Data Engineers work within the data ecosystem to: • Extract, integrate, and organize data from disparate sources • Clean, transform, and prepare data • Design, store, and manage data in data repositories. They enable data to be accessible in formats and systems that the various business applications, as well as stakeholders like Data Analysts and Data Scientists, can utilize. Skills: • Good knowledge of programming • Sound knowledge of systems and technology architectures • In-depth understanding of relational databases and non-relational datastores.
  • 11. Functions: Translates data and numbers into plain language, so organizations can make decisions. Responsabilities: • Inspect, and clean data for deriving insights; • Identify correlations, find patterns, and apply statistical methods to analyze and mine data; • Visualize data to interpret and present the findings of data analysis. “Are the users’ search experiences generally good or bad with the search functionality on our site” “What is the popular perception of people regarding our rebranding initiatives” “Is there a correlation between sales of one product and another." Skills: • Good knowledge of spreadsheets, writing queries, and using statistical tools to create charts and dashboards. • Modern data analysts also need to have some programming skills. • They need strong analytical and story-telling skills. Data Analysts
  • 12. Responsabilities: • Analyze data for actionable insights • Build Machine Learning or Deep Learning models that train on past data to create predictive models “How many new social media followers am I likely to get next month?” “What percentage of my customers am I likely to lose to competition in the next quarter” “Is this financial transaction unusual for this customer?”. Skills: • Knowledge of Mathematics, Statistics • Understanding of programming languages, databases, and building data models. • They also need to have domain knowledge. Data Scientists
  • 13. Responsabilities: Leverage the work of Data Analysts and Data Scientists to look at possible implications their business and the actions they need to take or recommend. Business Analysts Business Intelligence Analysts Responsabilities: • Focus on market forces and external influences that shape their business. • Organize and monitor data on different business functions • Explore that data to extract insights and actionables that improve business performance.
  • 14. What is Data Engineering? S 1 1 3 • The field of Data Engineering concerns itself with the mechanics for the flow and access of data • Its goal is to make quality data available for fact-finding and data-driven decision making • The world of data has grown to include wide-ranging sources, structures, and types of data.
  • 15. Collecting source data Processing data Storing data Making data available to users securely The field of Data Engineering involves: Develop tools, workflows, and processes that help you acquire data from multiple sources. • Design, build, and maintain scalable data architecture to store data. • Data could be stored in databases, data warehouses, data lakes, or any other type of data repository.
  • 16. Collecting source data Processing data Storing data Making data available to users securely The field of Data Engineering involves: Processing data includes cleaning, transforming, and preparing data so that it is usable. • Implement and maintain distributed systems for large-scale processing of data. • Design pipelines for the extraction, transformation, and loading of data into data repositories. • Design or implement solutions for validating and safeguarding quality, privacy, and security • of data. • Optimize tools, systems, and workflows for performance, reliability, and scalability. • Ensure data meets all regulatory and compliance guidelines.
  • 17. Collecting source data Processing data Storing data Making data available to users securely The field of Data Engineering involves: • Architect or implement data stores for the storage of processed data. • Ensure systems are scalable, keeping in mind the evolving nature of data and business needs. • Ensure tools and systems are in place that take care of data privacy, security, compliance, • monitoring, backup, and recovery.
  • 18. Collecting source data Processing data Storing data Making data available to users securely The field of Data Engineering involves: • APIs, services, and programs that retrieve data on defined parameters for use by end-users. • Interfaces and dashboards that present data to users so they can derive insights from the data. • Ensure the right measures and checks and balances are in place to keep data secure and provide rights-based access to users.
  • 19. Architect: Skills: To architect any data management system, be it for collating source data or storing processed analysis-ready data To ensure data stores are available and optimized for use, you need to have expertise in databases. Proficiency in database tools, programming languages, and distributed systems all come under data engineering, but they may require different skill sets.
  • 20. • More than any other data profession, data engineering is about the tools and technologies involved in data manipulation. • But it is also about understanding the complexities of data and how it is ultimately leveraged for fact- finding and decision-making. Summary
  • 21. S 1 1 4 Viewpoints: Defining Data Engineering How do you define data engineering? Data engineering involves •Designing •Building •Maintaining data infrastructures and platforms Data infrastructures include •Databases •Big Data Repositories • DataPipelines for transforming and moving data between data systems Data Engineers •Develop and optimize data systems •Make data available for analysis Data Analysts •Analyze data in data systems to report and derive insights Data Scientists •Perform deeper analysis on data •Develop predective models to solve more complex data problems
  • 22. How do you define data engineering? Data Engineers ensure • Data is highly available • Consistent • Secure • Recoverable Data Analysts and Data Scientists • Make use of the data that Data Engineers provide Data Engineers work • Ensure data matches their needs
  • 23. How do you define data engineering? Data Engineers involves •Designing •Maintaining •Optimizing data systems •Selecting the right databases, storage systems and cloud architecture •Ensuring data flow inside an organization is seamless •Ensuring data can be delivered to stakeholders when they need it Data Analysts and Data Scientists •Make use of data provided by Data Engineers for purposes of análisis and predictions How do you define data engineering?
  • 24. How do you define data engineering? Data Engineers • Extract and collect raw data from multiple sources • Transform and store data in table form Data Analysts and Data Scientists • Perform analysis on data prepared by Data Engineers How do you define data engineering?
  • 25. How do you define data engineering? Data Engineers • Precursor to Data Analysis and Data Science Data Engineers enable Data Analysts and Data Scientists • Picking the right databases and tools • Building the right data pipelines Data Analysts and Data Scientists use this data to • Build reports • Perform statistical analysis How do you define data engineering? How do you define data engineering?
  • 26. S 1 1 5 Viewpoints: Evolution of Data Engineering Can you talk about how data engineering has evolved over the past couple of decades? • Data engineering landscape is very different • The quantity of data that is handled was unthinkable • Variety and formats of data available have evolved • Big Data was unheard of • Expectations have also evolved • Today, there is widespread use of automation tools Can you talk about how data engineering has evolved over the past couple of decades? • With cloud computing, data infrastructure is now available as a service • A DE, today, needs to do a lot les from scratch and spend les time on setting up and managing systems • Before: Only relational databases and data warehouses. Today: NoSQL and other types of data repositories • DE need to be able to work with Big Data systems and pipelines • DE need to know a variety of tolos, data systems and how to suse them for solving data rpoblems
  • 27. Can you talk about how data engineering has evolved over the past couple of decades? • Before: The approach hierarchical – Enterprise Architects of Data Architects made decisions regarding data storage • DE work closely with developers to see how best to meet their requirements • Availability of different types of data sources, such as, the IoT and API feeds • Variety of databases: • Wide column stores (Cassandra and Hbase) • Key-value pair databases • Hadoop for Big Data • Doc stores such as MongoDB and Coachbase • Evolution of data sources and the variety of data have contributed • Before: Focused on database management, ETL pipelines and data visualization • Now: Distributed computing, DevOps and Implementation of Machine Learnign models
  • 29. Responsabilities and Skillsets of a Data Engineer At a broad level, Data Engineers: • Extract, organize, and integrate data from disparate sources • Prepare data for analysis and reporting by transforming and cleansing it • Design and manage data pipelines that encompass the journey of data from source to destination systems • Setup and manage the infrastructure required for the ingestion, processing, and storage of data: Data Platforms, Data Stores, Distributed systems, Data Repositories S 1 2 1 The overarching responsibility of a data engineer is to provide analytics-ready data to data consumers Data is analytics-ready when: • Accurate • Reliable • Complies to regulations it is governed • Accessible to consumers when they need it.
  • 30. Technical Skills administrative tools, system utilities and commands load balancing and application performance monitoring
  • 31.
  • 32. Experience of working with ETL tools such as IBM Infosphere Information Server, AWS Glue, and Improvado.
  • 33.
  • 34. Functional Skills • The ability to convert business requirements into technical specifications. • The ability to work with the complete software development lifecycle which includes ideation, architecture, design, prototyping, testing, deployment, and monitoring. • An understanding of data’s potential application in business. • And an understanding of the risks of poor data management which essentially covers data quality, privacy, security, and compliance. software engineering data science DATA ENGI NEER ING
  • 35. • Data engineering is a team sport. • Interpersonal skills, teamwork, and collaboration • As a data engineer, you need to be able to communicate effectively with both technical and non-technical stakeholders in a manner that a clear understanding can be established. Soft Skills DATA ENGINEER ANALYSTS DATA SCIENTISTS BUSINESS USERS TECHNICAL TEAMS
  • 36. S 1 2 2 Viewpoints: Skills and Qualities to be a Data Engineer What are some of the technical skills required to be a successful Data Engineer? Data engineering is • A fast-moving technology Data Engineers need • To love data Technical skills depend • On the job role Experience required in Retail • Relational databases • Cassandra • Google Bigtable • Kafka Stream • WebSphere MQ In Healthcare or Socical Media • Need of different skillsets
  • 37. What are some of the technical skills required to be a successful Data Engineer? • Acquiring technical skills is the easier part of the overall skills required • Important to understant the basic theory about data structures and how to work with data • Willing to roll with change What are some of the technical skills required to be a successful Data Engineer? • Good understanding of networking, both LAN and WAN • Good understanding of different types of storage, local as well as cloud-based • Very Good knowledge of operating systems • Expertise in databases, both RDBMS and NoSQL • Programming skills in any language such as Java, C, or Python • Automation skills
  • 38. Knowledge of •One or more programming languages and operating systems Knowledge of IT infrastructure and landscape: •Computer architecture •Cloud •Virtual machines •Different types of storage Proficiency in SQL Working knowledge of •One or more databases, both relational and NoSQL Knowledge and experience of: •ETL •Data Pipelines •Data Warehouses •Data Lakes •Other Big Data systems What are some of the technical skills required to be a successful Data Engineer?
  • 39. • Curiosity • Questioning skills – asking questions to business and technical users • Enjoy problema-solving and troubleshooting • Being Good at teamwork, collaboration, and communication • Being Good at logic and having an interest in coding What are some of the qualities and soft skills required to be a successful Data Engineer? • Good problem solving and communication skills • Detail-oriented control-freak • There is a lot of detail involved in working with data • Its important to get into the details and make sure that you don´t leave any of the boxes unchecked • Being in control and aware of the choices that are being made in the data environment, and their trade-offs • Interact with developers • Defend your choices to management • Advocate for the choices you make • Good work ethic • Passion to learn What are some of the qualities and soft skills required to be a successful Data Engineer? What are some of the qualities and soft skills required to be a successful Data Engineer? What are some of the qualities and soft skills required to be a successful Data Engineer? • Curiosity • Good communication • Eagerness to learn What are some of the qualities and soft skills required to be a successful Data Engineer?
  • 40. S 1 2 3 Viewpoints: A Day in the Life of a Data Engineer The need Launching a new shampoo in the market How a product gets talked about on social media has an immediate impact on sales numbers and brand perception (customer sentiment). They want to keep a close eye on all social media and know what online communities say about their product—positive feedback, negative comments, suggestions, even comparisons with their current favorites.
  • 41. The Goal • Prototype of a dashboard using a sentiment analysis algorithm and dummy data to serve this goal. • The dashboard displayed graphs of different customer sentiments plotted as scores across social media sources and consumer demographics. • The idea was an instant hit and got a go-ahead for implementation by the business team. • This is where our team of data engineers stepped in—to make this prototype a reality.
  • 42. The Solution My first task was to pull in data from all the social media sites and online sources identified by the business teams into the organization’s environment. I started with APIs to pull the tweets and posts with our product’s hashtag into temporary storage. I then turned to the eCommerce portals and product review blogs to collect our product-specific data and move that into temporary storage as well—an activity known as web scraping. Tweets, posts, comments, articles, even memes—there was quite a mix of formats and structures. Once I had all the data I needed, I inspected it to assess the transformations that will need to be performed on this data before it can be loaded into the database. We decided to create a Python program for processing the data and loading it into the database. I cleaned up the data and transformed it into a format that would be stored in the database. This is the database from which the dashboard is pulling the data to display the reports.
  • 43. Next Steps Build a data pipeline that extracts, transforms, and loads the data on an ongoing basis. Once this process is in place, business teams will be able to see a real-time projection each time they log into the dashboard.