Join us for a dynamic workshop focused on Big Data and cutting-edge data engineering tools within Google Cloud Platform (GCP). This session provides a concise yet comprehensive introduction to:
1) Core Big Data principles
2) GCP's role in modern data engineering
3) Practical use of GCP tools like BigQuery, Cloud Dataflow, and Cloud Dataprep
Engage in real-world applications and gain best practices
2. “Data is the new oil because data
can be used to derive insights.
Depending on what a company
does, insights can drive customer
retention, upselling, new revenue
models, advertising, etc. If data is
the new oil, insights are the new
money.” - Forbes
3. A developer with a tester mindset
Data engineers are the people who design the system that unifies data and
can help others navigate it.
Data engineers perform many different tasks including: Acquisition,
Cleansing, Conversion and deduplication.
What is a data engineer ?
ETL Tools: ETL (extract, transform, load) tools move data between systems. They access data, then apply rules to “transform” the data through steps that make it more suitable for analysis.
SQL: Structured Query Language (SQL) is the standard language for querying relational databases.
Python: Python is a general programming language. Data engineers may choose to use Python for ETL tasks.
Cloud Data Storage: Including Amazon S3, Azure Data Lake Storage (ADLS), Google Cloud Storage, etc.
Query Engines: Engines run queries against data to return answers. Data engineers may work with engines like Dremio Sonar, Spark, Flink, and others.
1980s: Server on-premises. You own everything, and you manage it.
2000s: Data Centers. Rent the space, but pay and manage the hardware. No direct physical access to the computers.
Now: First Generation Cloud with Virtualized Data Centers. You rent hardware and space, still controlling and configuring virtual machines. Pay only for what you provision.
Next: Managed Services. Completely elastic storage, processing, ML. Pay for what you use.