Nessa sessão falamos sobre o que é Data Lake e o que a Microsoft oferece dentro da plataforma do Microsoft Azure para endereçar problemas de escalabilidade, performance e segurança
8. UnstructuredData
Informationwithout a [Pre-DefinedModel] or is not Organized
Typically is Text-Heavywith Irregularities & Ambiguities
Difficultto Understandusing TraditionalPrograms
IDC & EMC Project Estimated DataGrowth of 40 Zettabytesby 2020
9. DataLake
Methodof Storing Data within a Systemor Repository
DataLakeCharacteristics
- Structured= Relational Databases [Rows & Columns]
- Semi-Structured= CSV, Logs, XML, JSON
- Unstructured = Emails, Documents, Binaries, Audio, Video, PDFs
- Open-Source = Apache Hadoop [HDFS]
- Microsoft Azure = Azure Data Lake Store [ADLS]
- AmazonAWS = Amazon S3
Usedfor
- Reporting
- Visualization
- Analytics
- Machine Learning
Single Store of Data in EnterpriseRangingfromRaw Data [Copyof SourceSystem]
CentralizedData Storefor Enterprises
10. Structured,Processed Structured/ Semi-Structured/ Unstructured/ Raw
Data Warehouse Data Lake
Schema-On-Read
DesignedforLow-Cost Storage
HighAgile, Configure& Reconfigure
Schema-On-Write
Expensivefor Large DataVolumes
LessAgile,FixedConfiguration
BusinessProfessional
DataLake
DataScientists
12. DataBricks
The UnifiedAnalytics Platform
Unifying Data Science,Engineeringand Business
Accelerateperformance
with an optimized Spark platform
Increase productivity
through interactive data science
Streamline processes
from ETL to production
Reduce costand complexity
with a fully managed, cloud-native platform
13. Azure Data Lake Store [ADLS]
UnlimitedStorage Capability– Petabyte-Size Files& Trillions of Objects
ScaleThroughput for Massively-Parallel Analytics
HDFS[HadoopDistributedFile-System] for Cloud[MicrosoftAzure]
1 TB – R$ 120 | 10 TB – R$ 1.962| 100TB – R$ 9.628 – by MonthlyCommitment Package
17. Treinamento
Big Data para DBA’s e Desenvolvedores no Ecossistemado Microsoft Azure
The Core
Understanding
Cloud Computing
Big Data
SQL Server 2016/2017
NoSQL
Apache Software Foundation
Hadoop
Spark
Microsoft Azure
HDInsight
Azure Data Lake Store/Analytics
The
Programming
Languages
Hive
Drill
Pig
PolyBase
Phoenix
Scala
PySpark
Spark-SQL
The Real-State
Databases
Azure SQL DB
Azure SQL Dw
HBase
CosmosDB
The In-Memory &
Streaming Processing
Spark
DataBricks
Storm
Azure Stream Analytics [ASA]
Kafka
The On-Demand &
Analytical Processing
Azure Data Lake Store
The Data-Transfer &
Orchestrators
Sqoop
Oozie
Apache Nifi
Apache Airflow
Azure Data Factory - [ADF]
~ 24 Hrs
Local – BeloHorizonte
Sexta, Sábado e Domingo de
09:00 Hs às 18:00 Hs
- 03/08/2018
- 04/08/2018
- 05/08/2018
http://bit.do/bhbigdata