Eric Sun

20 Followers

4 SlideShares 20 Followers 17 Followings

With the thorough understandings of data, application & network architecture, Eric has developed & proven a set of approaches to improve the performance & ROI by 50%~200% based on the company's existing DW/BI infrastructure. His 1st philosophy is to make the best use of the tools and to create better tools, as he has witnessed many poor project results simply because everyone expects the out-of-box features to satisfy all the requirements, yet few are willing to to deep dive into the tool and explore its full potential. We often debates about which tool is the best, yet Eric believes that it is crucial to provide the valuable consulting and eduction to enable more team members and clien...

hadoop incremental upsert time travel data warehouse hive hudi delta iceberg data lake big data json etl nosql sql elt jdbc fastload mapreduce tdch teradata

Activity
About

Eric Sun

Presentations

How To Buy Data Warehouse

Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop

ETL Practices for Better or Worse

Reshape Data Lake (as of 2020.07)

Likes

Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021

Spark SQL Bucketing at Facebook

Modernizing Big Data Workload Using Amazon EMR & AWS Glue

How to test infrastructure code: automated testing for Terraform, Kubernetes, Docker, Packer and more

Presto Strata London 2019: Cost-Based Optimizer for interactive SQL on anything

Trillion Dollar Coach Book (Bill Campbell)

"Smooth Operator" [Bay Area NewSQL meetup]

Dynamic pricing of Lyft rides using streaming

YugaByte DB Internals - Storage Engine and Transactions

What’s new in Apache Spark 2.3

ORC improvement in Apache Spark 2.3

Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache

Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache

Apache Arrow: In Theory, In Practice

What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginners | Edureka

Top 5 Deep Learning and AI Stories - October 6, 2017

Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning

Scala Reflection & Runtime MetaProgramming

What to Expect for Big Data and Apache Spark in 2017

Hive: Loading Data

Tuning Java for Big Data

Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer

Introducing Neo4j 3.0

File Format Benchmark - Avro, JSON, ORC & Parquet

Dongwon Kim – A Comparative Performance Evaluation of Flink

Why apache Flink is the 4G of Big Data Analytics Frameworks

Apache Hive Hook

Spark etl

Hive tuning