Personal Information
Organization / Workplace
San Francisco Bay Area, CA United States
Occupation
Data Expert with System Architecture Insight
Industry
Technology / Software / Internet
Website
goldenorbit.wordpress.com
About
With the thorough understandings of data, application & network architecture, Eric has developed & proven a set of approaches to improve the performance & ROI by 50%~200% based on the company's existing DW/BI infrastructure.
His 1st philosophy is to make the best use of the tools and to create better tools, as he has witnessed many poor project results simply because everyone expects the out-of-box features to satisfy all the requirements, yet few are willing to to deep dive into the tool and explore its full potential.
We often debates about which tool is the best, yet Eric believes that it is crucial to provide the valuable consulting and eduction to enable more team members and clien...
Tags
hadoop
incremental
upsert
time travel
data warehouse
hive
hudi
delta
iceberg
data lake
big data
json
etl
nosql
sql
elt
jdbc
fastload
mapreduce
tdch
teradata
See more
Presentations
(4)Likes
(67)Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
•
2 years ago
Spark SQL Bucketing at Facebook
Databricks
•
4 years ago
Modernizing Big Data Workload Using Amazon EMR & AWS Glue
Noritaka Sekiyama
•
4 years ago
How to test infrastructure code: automated testing for Terraform, Kubernetes, Docker, Packer and more
Yevgeniy Brikman
•
4 years ago
Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anything
Piotr Findeisen
•
4 years ago
Trillion Dollar Coach Book (Bill Campbell)
Eric Schmidt
•
5 years ago
"Smooth Operator" [Bay Area NewSQL meetup]
Kevin Xu
•
5 years ago
Dynamic pricing of Lyft rides using streaming
Amar Pai
•
5 years ago
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
•
5 years ago
What’s new in Apache Spark 2.3
DataWorks Summit
•
5 years ago
ORC improvement in Apache Spark 2.3
DataWorks Summit
•
5 years ago
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
•
6 years ago
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
•
6 years ago
Apache Arrow: In Theory, In Practice
Dremio Corporation
•
6 years ago
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginners | Edureka
Edureka!
•
6 years ago
Top 5 Deep Learning and AI Stories - October 6, 2017
NVIDIA
•
6 years ago
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)
Spark Summit
•
8 years ago
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Spark Summit
•
7 years ago
Scala Reflection & Runtime MetaProgramming
Meir Maor
•
7 years ago
What to Expect for Big Data and Apache Spark in 2017
Databricks
•
7 years ago
Hive: Loading Data
Benjamin Leonhardi
•
8 years ago
Tuning Java for Big Data
Scott Seighman
•
9 years ago
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Spark Summit
•
7 years ago
Introducing Neo4j 3.0
Neo4j
•
7 years ago
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
•
7 years ago
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
•
8 years ago
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
•
8 years ago
Apache Hive Hook
Minwoo Kim
•
10 years ago
Spark etl
Imran Rashid
•
8 years ago
Hive tuning
Michael Zhang
•
10 years ago
Personal Information
Organization / Workplace
San Francisco Bay Area, CA United States
Occupation
Data Expert with System Architecture Insight
Industry
Technology / Software / Internet
Website
goldenorbit.wordpress.com
About
With the thorough understandings of data, application & network architecture, Eric has developed & proven a set of approaches to improve the performance & ROI by 50%~200% based on the company's existing DW/BI infrastructure.
His 1st philosophy is to make the best use of the tools and to create better tools, as he has witnessed many poor project results simply because everyone expects the out-of-box features to satisfy all the requirements, yet few are willing to to deep dive into the tool and explore its full potential.
We often debates about which tool is the best, yet Eric believes that it is crucial to provide the valuable consulting and eduction to enable more team members and clien...
Tags
hadoop
incremental
upsert
time travel
data warehouse
hive
hudi
delta
iceberg
data lake
big data
json
etl
nosql
sql
elt
jdbc
fastload
mapreduce
tdch
teradata
See more