2. I. Introduction
II. Papers reviews
1. Lambda Architecture - Real time Data Processing
2. Delta Lake: High-Performance ACID Table Storage over
Cloud Object Stores
3. Lakehouse: A New Generation of Open Platforms that
Unify Data Warehousing and Advanced Analytics
4. Conformer: Convolution-augmented Transformer for
Speech Recognition
5. Intent Detection and Slot Filling for Vietnamese
III. Conclusion
3. I. Introduction
➽ My graduation thesis:
Dynamic routing in urban traffic based on
traffic condition.
⊕ Sub-work:
- Prepare and process data: users’ speech
report about traffic congestion
- Store in data storage (in form of audio
files)
=> Analysis to predict traffic condition.
Try: https://bktraffic.com/home/
4. 1. Lambda Architecture - Real time Data Processing
- Big data is increasingly popular -> Data can be very intensive in terms
of Volume, Variety and Velocity.
- Traditional batch data processing framework like Hadoop, MapReduce is not
suitable for real-time applications.
- Solution: stream data processing technique -> provide fast output for
real-time applications.
Problem: Trade-off between latency and accuracy.
=> Lambda Architecture is a big data processing architecture both
techniques: balancing the workload whether to serve Real-time or
Reliable results.
5.
6. 2. Delta Lake: High-Performance ACID Table Storage
over Cloud Object Stores
- Data storage plays a key role in big data architecture.
- Conventional database warehouses like SQL relational database, or even NoSQL
-> has limitation on unbounded data (data streaming in real-time, etc).
- Solution: data lake (large, cost-effective, efficient for structured and
unstructured data, etc) -> becomes the top choice in the industry
Problem: normal data lake is inefficient to manage or operate.
=> Delta Lake: add a ACID layer to data lake -> make it efficient to
query, control just like traditional data warehouses.
7. 3. Lakehouse: A New Generation of Open Platforms that
Unify Data Warehousing and Advanced Analytics
- Adopts Delta Lake to propose the supreme of Lakehouse architecture over Data
warehouse architecture.
- With advance data storage system, unify batch & stream processing engines ->
reduce the burden of development and maintenance that Lambda architecture
presents
Delta Lake + Apache Spark + minimal specialized services
=> Full fledged big data architecture: scalable, high performance,
applicable for machine learning and deep learning techniques, etc.
9. 4. Conformer: Convolution-augmented Transformer for Speech
Recognition
2 prevalent AI techniques:
● Transformer (ie. BERT): capturing content-based from global interactions.
● Convolutional neural network (CNN): exploit local features
=> Novel model combines both techniques: Convolution-augmented transformer,
namely Conformer.
- Previous Transformer or CNN based models have shown promising results in
Automatic Speech Recognition (ASR).
- Conformer (based on both) -> outperforms them and becomes SOTA in ASR.
=> It can capture both local and global ‘parameters’ of an audio sequence.
11. 5. Intent Detection and Slot Filling for Vietnamese
- Adopts from “BERT for Joint Intent Classification and Slot Filling” by a
Vietnamese AI institution (VinAI).
- Original model: based on BERT to understand human sentences (natural
language understanding).
- Its main purpose is to: detect if a sentence is of subject that we are
interested in (intent detection) and extract important information (slot
filling)
13. III. Conclusion
The 5 papers are for my research works:
- Developing and deploy system which is efficient for big data analysis (in my
case, user speech report analysis)
- Applied solutions: Speech-to-text recognition + Text information extraction.
Paper:
[1] Yuvraj Kumar (2020). Lambda Architecture - Real time Data Processing. DOI:
10.13140/RG.2.2.19091.84004
[2] Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. DOI:
10.14778/3415478.3415560 (2020)
[3] Armbrust, Michael and Ghodsi, Ali and Xin, Reynold and Zaharia, Matei (2021). Lakehouse: A
New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics.
[4] Gulati, Anmol and Qin, James and Chiu, Chung-Cheng and Parmar, Niki and Zhang, Yu and Yu,
Jiahui and Han, Wei and Wang, Shibo and Zhang, Zhengdong and Wu, Yonghui and others
(2020). Conformer: Convolution-augmented Transformer for Speech Recognition. URL:
https://arxiv.org/abs/2005.08100
[5] Dao, Mai Hoang and Truong, Thinh Hung and Nguyen, Dat Quoc (2021). Intent Detection and
Slot Filling for Vietnamese. URL: https://arxiv.org/abs/2104.02021