MariaDB: 新兴的开源大数据引擎 - this presentation was translated into Chinese by Ni Yan of Nexedi. It was presented in Beijing, China at the Police Big Data Conference.
Build 1 trillion warehouse based on carbon databoxu42
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark™ is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
MariaDB: 新兴的开源大数据引擎 - this presentation was translated into Chinese by Ni Yan of Nexedi. It was presented in Beijing, China at the Police Big Data Conference.
Build 1 trillion warehouse based on carbon databoxu42
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark™ is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
Transfer learning (TL) is a research problem in machine learning (ML) that focuses on applying knowledge gained while solving one task to a related task
2. About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
13. SQL 練習
• 查詢 1995-10-11 ~ 1996-07-01 出口國家的訂單筆數?
• 查詢所有銷售紀錄中,銷售業績最好的前三名業務?
13
select shipcountry,count(*) from orders
where orderdate between date('1995-10-11') and date('1996-07-01')
group by 1
select sum(sales_price) as total_sales, firstname from (
select c.firstname,sales_price from orders a
join (select orderid, unitprice*quantity as sales_price from `order details`) b
on a.orderid=b.orderid
join employees c
on a.employeeid=c.employeeid
)d
group by 2
order by 1 desc
14. SQL 練習
• 查詢 1996整年度訂單銷售當中,依照分類(CategoryName)計算
銷售金額
14
select categoryname, sum(sales_price) as total_sales from (
select a.orderid, c.productid, b.unitprice*quantity as sales_price, categoryname from (
select orderid from orders
where year(orderdate)=1996
)a
join `order details` b
on a.orderid=b.orderid
join products c
on b.productid=c.productid
join categories d
on c.categoryid=d.categoryid
)e
group by 1
order by 2
24. ETL on using Python
• ETL 常見的資料整理方式
• drop table if exists
• create temp table
• drop temp table
• select * from table into table
• create table as select
• table join
• select * from table where column in
24
25. ETL on using Python
• 讀取本地端 csv 檔案,透過 Python 進行資料整理,寫入資料庫
• 讀取 MySQL 上的資料表,經過資料整理轉換後,再寫入資料庫
25
etl.ipynb