DigitalTransformation
withMicrosoftAzure
Cloud Computing
Big Data
DataLake
Method of Storing Data within a System or Repository
DataLakeCharacteristics
- Structured= Relational Databases [Rows & Columns]
- Semi-Structured= CSV, Logs, XML, JSON
- Unstructured = Emails, Documents, Binaries, Audio, Video, PDFs
- Open-Source = Apache Hadoop [HDFS]
- Microsoft Azure = Azure Data Lake Store [ADLS]
- AmazonAWS = Amazon S3
Usedfor
- Reporting
- Visualization
- Analytics
- Machine Learning
SingleStore of Data in Enterprise Ranging from Raw Data [Copy of SourceSystem]
CentralizedData Store for Enterprises
DataLake
Structured, Processed Structured / Semi-Structured / Unstructured / Raw
Data Warehouse Data Lake
Schema-On-Read
Designed for Low-Cost Storage
High Agile, Configure & Reconfigure
Schema-On-Write
Expensive for Large Data Volumes
Less Agile, Fixed Configuration
BusinessProfessional
DataLake
DataScientists
CashLess
Framework
Drill Inspired by Google'sDremel [Big Query]
https://research.google.com/pubs/pub36632.html
https://cloud.google.com/bigquery/
Schema-Free SQL Query Engine for Hadoop, NoSQL & Storage
Query Engine [ANSI-SQL] for Big Data [Raw] Exploration
For Analysts, Business Users, DataScientists & DataDevelopers
1. Self-Service Exploration
2. Data Agility
3. Interactive Query Response Time and Scale
Use Cases for ApacheDrill
1. Raw Data Exploration
2. Data Discovery
AzureData LakeAnalytics[ADLA]
On-DemandAnalytics Job Service
Start in Seconds, Scale Instantly and Pay per Job
Develop Massively Parallel Programs [MPP] with Simplicity
100 Hrs – R$ 332 | 500 Hrs – R$ 1.494 – by MonthlyCommitment Package
[HaaS] - Hadoop-as-a-Services
Store Destinations
DataBricks
The UnifiedAnalyticsPlatform
Unifying Data Science,Engineeringand Business
Accelerateperformance
with an optimized Spark platform
Increase productivity
through interactive data science
Streamline processes
from ETL to production
Reduce costand complexity
with a fully managed, cloud-native platform
ThankYou!
@luansql

Digital Transformation with Microsoft Azure

  • 1.
  • 6.
  • 8.
  • 10.
    DataLake Method of StoringData within a System or Repository DataLakeCharacteristics - Structured= Relational Databases [Rows & Columns] - Semi-Structured= CSV, Logs, XML, JSON - Unstructured = Emails, Documents, Binaries, Audio, Video, PDFs - Open-Source = Apache Hadoop [HDFS] - Microsoft Azure = Azure Data Lake Store [ADLS] - AmazonAWS = Amazon S3 Usedfor - Reporting - Visualization - Analytics - Machine Learning SingleStore of Data in Enterprise Ranging from Raw Data [Copy of SourceSystem] CentralizedData Store for Enterprises
  • 11.
  • 12.
    Structured, Processed Structured/ Semi-Structured / Unstructured / Raw Data Warehouse Data Lake Schema-On-Read Designed for Low-Cost Storage High Agile, Configure & Reconfigure Schema-On-Write Expensive for Large Data Volumes Less Agile, Fixed Configuration BusinessProfessional DataLake DataScientists
  • 13.
  • 14.
    Drill Inspired byGoogle'sDremel [Big Query] https://research.google.com/pubs/pub36632.html https://cloud.google.com/bigquery/ Schema-Free SQL Query Engine for Hadoop, NoSQL & Storage Query Engine [ANSI-SQL] for Big Data [Raw] Exploration For Analysts, Business Users, DataScientists & DataDevelopers 1. Self-Service Exploration 2. Data Agility 3. Interactive Query Response Time and Scale Use Cases for ApacheDrill 1. Raw Data Exploration 2. Data Discovery
  • 15.
    AzureData LakeAnalytics[ADLA] On-DemandAnalytics JobService Start in Seconds, Scale Instantly and Pay per Job Develop Massively Parallel Programs [MPP] with Simplicity 100 Hrs – R$ 332 | 500 Hrs – R$ 1.494 – by MonthlyCommitment Package [HaaS] - Hadoop-as-a-Services Store Destinations
  • 16.
    DataBricks The UnifiedAnalyticsPlatform Unifying DataScience,Engineeringand Business Accelerateperformance with an optimized Spark platform Increase productivity through interactive data science Streamline processes from ETL to production Reduce costand complexity with a fully managed, cloud-native platform
  • 17.