SlideShare a Scribd company logo
1 of 35
DeclareVictory with
Big Data
Clemens Szyperski
Principal Group Software Engineering Manager,
Big Data Analytics
Malaga, Spain
May 2017
Masters in EE: RWTH Aachen
PhD in CS: ETH Zurich
(under Niklaus Wirth)
Post Doc: ICSI at UC Berkeley
Co-founder of Oberon microsystems
and Myriad Group
Assoc/Prof CS at Queensland UT,
Brisbane, Australia
Architect, Developer, Lead,
Manager,Group Manager
in
Research, Office, Connected
Systems, Data Group
at
Microsoft since 1999
Who is
Clemens?
Languages: Oberon 2, Sather 2,
Component Pascal, Mianjin, PQEL (M),
SA-QL, U-SQL
Systems: Ethos,Tenet 2, Gardens, Blackbox
Component Builder, Project Oslo, .Net
Managed Extensibility Framework, Power
Query (in Excel & Power BI), Azure Stream
Analytics, Azure Data Lake Analytics, Azure
Time Series Insights
BigData and
Machine Learning /AI
Big Data means
 - a lot of data (volume)
 - crazy shapes (variety)
 - incoming!! (velocity)
Analytics (incl. ML/AI)
over Big Data means
 - compute at massive scale
 - complexity
 - fault tolerance
XKCD license
The Big Data
Explosion
Data Complexity (variety and velocity)
TB
GB
EB
…
PB
Big Data
Log files
Spatial &
GPS coordinates
Data market feeds
eGov feeds
Weather
Text/image
Click stream
Wikis/blogs
Sensors/RFID/
devices
Social sentiment
Audio/video
Web 2.0
Web Logs
Digital Marketing
Search Marketing
Recommendations
Advertising
Mobile
Collaboration
eCommerce
Payables
Payroll
Inventory
Contacts
Deal Tracking
Sales Pipeline
ERM
CRM
Data Size
yotta Y 10008 1024 septillion
zetta Z 10007 1021 sextillion
exa E 10006 1018 quintillion
peta P 10005 1015 quadrillion
tera T 10004 1012 trillion
giga G 10003 109 billion
mega M 10002 106 million
CLOUD
MOBILE
Growth of data
INTERNET CONNECTED
DIGITAL
ANALOG
1985 1990 1995 2000 2005 2010 2015 2020
All data generated
Schema agility AND experimentation
AND ML, image Processing,
graph, streaming
Operational Data
Highly modeled schema
Relational algebra
Examples:
Big Data at
Microsoft
Cosmos and Scope
- - Rooted in Dryad
- -A decade of development
Productized as Azure Data Lake
- -ADL Store
- -ADLAnalytics with U-SQL
Kafka Four Commas Club
(Ingestion of aTrillion+ Events/Day)
Cosmos stores exabytes of active data
Scope processes hundred of petabytes a day
Supports batch, interactive, streaming, ML
Data ranges across most Microsoft products
Bing and MSN click streams
Office and Windows telemetry
Xbox gaming
Cosmos comprises
hundreds of thousands of machines
millions of cores
petabytes of RAM
exabytes of disks
Still only a fraction of the
global Microsoft Cloud
Demo
Applying ML and Cognition at Scale
How can you
leverage
Big Data?
Use the Power of Public Cloud Services
to move beyond
- hardware lifecycles
- infrastructure management
- physical and cyber security infrastructure
- inflexible demand/scale/cost structures
- inadequate geo reach
Well, it can be fairly simple, actually
FullService
Family of services designed for
composition is called
Platform as a Service
 Contrast with low-level building
blocks (VMs, storage, network):
Infrastructure as a Service
 Contrast also with finished
solution services:
Software as a Service
Composition units are fully
deployed and operated instances
 Contrast this with source reuse
and with software components
 Litmus test: Can you ask “who
pays the power bill?”
Cloud services allow users to shed
the cost of operations, enable user
to stay on top of software and
hardware trends, and virtualize
physical resources of extreme
capacity.
Who doesWhat
Note that Private clouds aim for
many of the same, but on top of
resources deployed and controlled
on a customer’s premises.
Azure Stack enables such private
clouds while retaining much of the
Azure model.
Key CloudValue Proposition: Separation of Responsibilities
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Applications
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
On Premises Infrastructure
(as a Service)
Platform
(as a Service)
Software
(as a Service)
CustomerManages
ProviderManages
ServiceQuality
Security, Privacy
Tenancy Model
Performance Predictability
Scale vs. Price
Result Semantics
Management
Geographic and Geopolitical
Reliability
There are many qualities that
characterize a service.
The total design space for services is
overconstrained if all qualities are
equally important.
Understanding tradeoffs is thus
essential – and defines the service
engineering discipline.
Security, Privacy
• Encryption
• Data at rest (or always)
• Service- vs. user-managed keys
• Authentication
• Two-factor authentication
• Azure Active Directory (AAD) integration
• Federation with other providers and on-premises systems (AD)
• Authorization
• Role definitions, user-to-role assignments (RBAC)
• Managed shareable access keys (SAS,OAuth)
• DOS Protection, Attack and Intrusion Detection
• Network isolation
• All IP endpoints that are internal to a solution, on premise or in
cloud, are not exposed on the Internet (VNet’s: virtual networks)
Is the data secured?
Is the solution IP secured?
Is the service quality secured?
Performance
Predictability
Spectrum of predictability based on willingness to pay
• Static pre-allocation of fully dedicated resources
• Based on conservative static calculation of resource needs
• Based on requested resources – typically based on estimates
grounded in historic observation
• Either fails early or will run with promised resources
• Can be undermined by failures of underlying infrastructure
• Dynamic allocation of needed resources
• Risk of out-of-resource rejection in mid course
• Dynamic sharing of resources
• Risk of noisy neighbor impact
• Can use quota enforcement policies to keep individual resource
consumers within bounds
• Typically subject to overbooking policies to ensure high level of
resource utilization (and thus delivery at low cost)
Common performance metrics:
Throughput – data processed per
time unit.
Latency – time between earliest
possible and actual delivery of
results.
Metrics are usually observed
relative to benchmark or actual
workloads.
Tenancy Model
Single-Tenant Services
• Dedicated and isolated resources are granted to a tenant
• Example: dedicated clusters (VM sets with cluster-level features
for management, monitoring, etc.)
Multi-Tenant Services
• Resources are shared among tenants – simpler and cheaper
• Example: job-execution service
• Tenant isolation is fine-grained
• Security bar is more difficult to uphold
• Predictable Performance often impacted: “noisy neighbors”
• Performance irregularities caused by heavy use of shared resources
by some other users
A tenant is a logical customer
organization.
Multiple users may authenticate
under the same tenant.
A large customer may have multiple
tenancies.
Scale vs. Price
One Size will never Fit All
• Optimizing for most desirable
qualities (high availability,
reliability, predictability, security,
…) will counteract optimizing for
price
Average vs. Peak
• Max resource envelope scale
(like dedicated clusters)
• Max job envelope scale (auto-
resourcing per job submission)
• Actual job envelope scale (elastic
resourcing over duration of job)
• Likely discretized to max. elastic
adaptation rate
Price Sensitivity of a customer
depends on the value that a solution
generates for the customer.
Worst case example: solution is
required by law but does not
generate business value.
Best case example: solution itself is
a high-margin product sold by
customer.
workload
time
Result
Semantics
Many possible models for “good results”
Deterministic results
• Given inputs and chosen service operations fully determine results
Repeatable results
• While not fully pre-determined, rerunning a service request over the
same inputs will yield the same results (e.g. journaling of tie breaks)
Asymptotic results
• Over time, a service operation will yield closer approximations of the
ideal results (e.g., eventual consistency)
Best effort
• For some definition of effort, a service makes a best effort to yield the
desired result, but always returns the result it came up with
Time boxed
• Special best-effort case: do the best within an allotted time bound
A valid result of a service request is
one that meets the requestor’s
requirements.
Ideal definitions of validity (such as
mathematical ones) are often
“overkill” and sometimes
unattainable.
Practical definitions create a high-
dimensional engineering space.
Composing
Solutions
Security, Privacy
Tenancy Model
Performance Predictability
Scale vs. Price
Result Semantics
Management
Geographic and Geopolitical
Reliability
A closed solution can meet many
requirements by fiat.
A composed solution can still be
closed (hide its composition).
An openly composed solution
exposes its composition; here,
meeting many requirements is hard.
Composition
Composing solutions over cloud platform services
 In-house or Solution Integrator
 Cloud-only or hybrid
 Hide cloud platform as implementation detail
 Independent SolutionVendor
 As above, but in addition:
 Create multi-tenancy solution on top of platform
 Create billing models, incl. abstraction of platform bills
 Independent PlatformVendor
 As above, but in addition:
 Create new platform abstractions over existing ones
 A platform is open for third-party contributions (extensions) and
solutions built atop
 Enabling independent platform construction atop creates many
hard transitive challenges
Composition takes components as is
and assembles them into larger
pieces.
This is different from lower-level
forms of source-code reuse.
For service composition, lower-level
forms are definitely excluded.
How to Encode
Compositions?
Hey, Clemens, why can’t I just encode all this stuff in …
… Python
… Scala
… Go
… R
… C#
… Java
… whatnot?
I mean, composition is just programming after all, no?
Composing
using
Languages
Instructions can be very low-level
(close to the machine’s primitive
operations)
Instructions can be very high-level
(close to the problem domain at
hand)
Most languages strike a balance
 Too low-level (limits audience,
limits target machines)
 Too high-level (limits audience,
limits problem domains)
Given a computer with
some primitive operations
and a problem to solve.
Formulate a composition of
instructions to the
computer that solve the
problem. Skills Interest Audience
Machine
Specific
Domain
Specific
“General
Purpose”
Audience-
Specific
Languages
Languages that strive to be “general
purpose” end up being not quite
right at most anything.
To compensate, such languages
develop a large arsenal of
specialized but overlapping
capabilities.
The ideal maximized audience is
subdued by complexity.
Larger audiences can be served with
simpler languages to either side of
the “general purpose” point.
Consider a variety of
personas that characterize
how groups of people get
their tasks done.
Consider a set of personas
that fall into comparable
needs/skills categories.Call
that an audience.
Skills Interest
Audience Complexity
Machine
Specific
Domain
Specific
“General
Purpose”
“Audience
Specific”
Domain-
Specific
Languages
Embedded or internal DSLs were the latest craze for a while
 Language-embedded Query (LINQ) is a popular example
 Functional monadic query operators embedded as an expression
sub-language in general-purposeC#, even with its own syntax
Analyzability (static or dynamic) suggests the more limited
language be on the outside
 Opposite of LINQ style of languages (that embed a functional DSL
inside a general-purpose programming language)
 Example: U-SQL language that is essentiallyT-SQL DQL as the
outer layer andC# as the inner layer
DSLs come in two common shapes:
internal or embedded and external.
An internal DSL is embedded inside
a general-purpose language.
An external DSL is its own top-level
entity. Oddly, it may embed a
general-purpose language.
The Power of
Declaration –
Examples
Azure Resource Manager (ARM) templates
 Declarative composition of resources across services
 Repeatable deployment of solutions build over Azure platform services
Power Query expressions for Excel and Power BI
 Functional composition of dataflow across many data sources
 Dynamic analysis – pushes nested work out to smart data sources (like databases)
Azure Stream Analytics jobs
 Declarative job definition
 Functional composition of dataflow from N sources to M destinations
 Static analysis – guarantees repeatable, at-least-once results from streaming jobs
U-SQL scripts for Azure Data Lake
 Declarative job definition
 Functional composition of distributed & federated dataflow, incl. custom code
 Static analysis – determines distribution of work / federation of nested work
 Dynamic analysis – determines affinity of compute, failure masking tactics
Declarative Languages establish the
shape of the result in a form fully
amenable to static analysis (and
comprehension).
For functional programming folks:
think of a fixed universe of higher-
order functions that are
“understood” by the system (and
usually have distinctive syntax)
U-SQL as an
exampleof
declarative
power
U-SQL scripts for Azure Data Lake
• Cost
Compile-time partition elimination
• Predictable execution
Compile-time per-vertex memory determination
Compiler (and Optimizer, Code Gen) runs for every submission
• Performance
Optimizer-time plan building for scale-out and staged pipelining
All-of-topology optimization (not just all-of-stage)
Example: predicates pushed through (r/o annotated) custom code
Native code gen around arbitrary custom code
• Security
Compile-time separation of trusted from untrusted (custom) code,
deployment into segregated containers
Declaration of all-of-toplogy
semantics as a dataflow graph
Hosting of custom code in well-
defined roles inside that graph
AzureData Lake
Analytics
U-SQLScripts
U-SQL
 Unifies natively SQL’s declarative nature and C#’s general power
 Metadata service keeps “assembly” definitions
 Assembly: collection of uploaded files
 Custom code is at least a .NET assembly with a public method
 Custom code can then spawn processes and load other code; for
example: spin up a Python runtime with libraries and run a Python script
 Built-in support for R and Python as well as a range of cognitive functions
 Unifies querying structured and unstructured data
 Unstructured data: Schema on read
 Structured data: Metadata service keeps schema
 Unifies local (distributed) and remote (federated) queries
 FederateT-SQL queries to Azure SQL DB, Azure SQL DW, or to SQL
Server onVMs
A U-SQLScript uses an outer
language ofT-SQL (DDL and DQL)
to host an inner imperative
language (C#).
The outer declarative language is
used to automatically scale and
parallelize the inner islands of
custom code.
U-SQL
Extensibility
Extensibility at many levels, capturing semantic intent
 C# expressions in SELECT statements
 User-defined functions (UDF’s)
 User-defined aggregates (UDAgg’s)
 User-defined operators (UDO’s) – several kinds
Remember:T-SQL DML/DQL on
the outside, C# on the inside.
C# abstractions are also the basis
for extensibility.
User-Defined
Operators
User-Defined Extractors
 Extract streams of rows from input sources
User-Defined Outputters
 Serialize results and send to output targets
User-Defined Processors
 Take one row and produce one row
 Pass-through versus transforming
User-Defined Appliers
 Take one row and produce 0 to n rows
 Used with OUTER/CROSS APPLY
User-Defined Combiners
 Combines rowsets (like a user-defined join)
User-Defined Reducers
 Take n rows and produce 1 row
Scaled out with explicit U-SQL
Syntax that takes a UDO instance
(created as part of the execution):
EXTRACT
OUTPUT
PROCESS
COMBINE
REDUCE
U-SQL
Metadata
Object Model
ADLA
Account/Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
Tables Views TVFs
C# Fns
C#
UDAgg
Clustered
Index
Partitions
C#
Assemblies
C# Extractors
Data
Source
C# Reducers
C# Processors
C# Combiners
C# Outputters
External
tables
Abstract
objects
User
objects
Refers toContains Implemented
and named by
Procedures
Creden-
tials
MD
Name
C#
Name
C# Appliers
Table
Types
Legend
Statistics
C# UDTs
Other
resources
U-SQL Language Construction
Declarative Query and Transformation Language:
• Uses SQL’s SELECT FROM WHERE with GROUP
BY/Aggregation, Joins, SQL Analytics functions
• Optimizable, Scalable
Expression-flow programming style:
• Easy to use dataflow composition
• Composable, globally optimizable
Operates on Unstructured & Structured Data
• Schema on read over files
• Relational metadata objects (e.g. database, table)
Extensible from the ground up:
• Type system is based on C#
• Expression language IS C#
• User-defined functions (U-SQL and C#)
• User-defined Aggregators (C#)
• User-defined Operators (UDO) (C#)
U-SQL provides the Parallelization and Scale-out
Framework for Usercode
• EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER,
COMBINER, APPLIER
Federated query across external data sources
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;
Designed for
MassiveScale
JOIN operators
INNER JOIN
LEFT, RIGHT, or FULL OUTER JOIN
CROSS JOIN
SEMIJOIN equivalent to IN subquery
ANTISEMIJOIN equivalent to NOT IN subquery
Language constraints steer user towards parallelizable patterns
• ON clause comparisons need to be of the simple form (“equijoin”):
rowset.column == rowset.column
or AND conjunctions of the simple equality comparison
• If a comparand is not a column, wrap it into a column in a previous SELECT
• If the comparison operation is not ==, put it into the WHERE clause
• Turn the join into a CROSS JOIN if no equality comparison
U-SQL is the product form of the
Microsoft-internal Scope.
Runs big parts of the business on
hundreds of thousands of machines.
Single jobs easily expand to run on
thousands of machines.
The U-SQL language is constrained to
steer users towards patterns that can
be parallelized. Example here: Joins.
U-SQL
Compilation
Run before every execution to
leverage actual input data
characteristics.
Analysis of entireU-SQL script as
well as metadata from data sources.
Elimination of empty partitions.
Splitting into pipelineable steps.
U-SQL
Optimization
Run after every compilation.
Builds physical execution graph.
Groups pipelineable steps into
stages.
Stages are scaled out to execute
over a chosen number of vertices,
influenced by input sharding, stages
before and after.
Per-job and user-driven level of
parallelization.
Tooling:
Detailed visibility into execution steps,
for debugging.
Heatmap like functionality to identify
performance bottlenecks.
Q &A
(Yes, we areHiring!)
Declare Victory with Big Data

More Related Content

What's hot

Multi Tenancy In The Cloud
Multi Tenancy In The CloudMulti Tenancy In The Cloud
Multi Tenancy In The Cloudrohit_ainapure
 
Ensuring data storage security in cloud computing
Ensuring data storage security in cloud computingEnsuring data storage security in cloud computing
Ensuring data storage security in cloud computingUday Wankar
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architectureAdeel Javaid
 
Saa s multitenant database architecture
Saa s multitenant database architectureSaa s multitenant database architecture
Saa s multitenant database architecturemmubashirkhan
 
Best Practice Public Cloud Security
Best Practice Public Cloud SecurityBest Practice Public Cloud Security
Best Practice Public Cloud SecurityJason Singh
 
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity -  Chartered Insurance InstituteInsurtech, Cloud and Cybersecurity -  Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity - Chartered Insurance InstituteHenrique Centieiro
 
Community IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration PlanningCommunity IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration PlanningCommunity IT Innovators
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private CloudsPatrick Nicolas
 
Cloud Computing: Architecture, IT Security and Operational Perspectives
Cloud Computing: Architecture, IT Security and Operational PerspectivesCloud Computing: Architecture, IT Security and Operational Perspectives
Cloud Computing: Architecture, IT Security and Operational PerspectivesMegan Eskey
 
Presentation on Openstack in null Bhopal Chapter
Presentation on Openstack in null Bhopal ChapterPresentation on Openstack in null Bhopal Chapter
Presentation on Openstack in null Bhopal ChapterHemraj Singh Chouhan
 
Building Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaSBuilding Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaSSameera Jayasoma
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLASLA-Ready Network
 
Migrating national services to the Cloud
Migrating national services to the CloudMigrating national services to the Cloud
Migrating national services to the CloudMike Jones
 
AWS&Deloitte Blockchain
AWS&Deloitte BlockchainAWS&Deloitte Blockchain
AWS&Deloitte BlockchainAlé Flores
 
How to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and TrustHow to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and TrustApcera
 
Hybrid Cloud Solutions (with Datapipe)
Hybrid Cloud Solutions (with Datapipe)Hybrid Cloud Solutions (with Datapipe)
Hybrid Cloud Solutions (with Datapipe)RightScale
 
Multi-tenancy In the Cloud
Multi-tenancy In the CloudMulti-tenancy In the Cloud
Multi-tenancy In the Cloudsdevillers
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAmazon Web Services
 

What's hot (20)

Multi Tenancy In The Cloud
Multi Tenancy In The CloudMulti Tenancy In The Cloud
Multi Tenancy In The Cloud
 
Ensuring data storage security in cloud computing
Ensuring data storage security in cloud computingEnsuring data storage security in cloud computing
Ensuring data storage security in cloud computing
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
 
Saa s multitenant database architecture
Saa s multitenant database architectureSaa s multitenant database architecture
Saa s multitenant database architecture
 
Best Practice Public Cloud Security
Best Practice Public Cloud SecurityBest Practice Public Cloud Security
Best Practice Public Cloud Security
 
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity -  Chartered Insurance InstituteInsurtech, Cloud and Cybersecurity -  Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
 
Community IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration PlanningCommunity IT Webinar - Cloud Migration Planning
Community IT Webinar - Cloud Migration Planning
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private Clouds
 
Cloud Computing: Architecture, IT Security and Operational Perspectives
Cloud Computing: Architecture, IT Security and Operational PerspectivesCloud Computing: Architecture, IT Security and Operational Perspectives
Cloud Computing: Architecture, IT Security and Operational Perspectives
 
Presentation on Openstack in null Bhopal Chapter
Presentation on Openstack in null Bhopal ChapterPresentation on Openstack in null Bhopal Chapter
Presentation on Openstack in null Bhopal Chapter
 
Building Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaSBuilding Multi-tenant SaaS Applications using WSO2 Private PaaS
Building Multi-tenant SaaS Applications using WSO2 Private PaaS
 
Preparing for Multi-Cloud
Preparing for Multi-CloudPreparing for Multi-Cloud
Preparing for Multi-Cloud
 
“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA“Tools” and Standards for Cloud-SLA
“Tools” and Standards for Cloud-SLA
 
Cs6703 grid and cloud computing unit 5
Cs6703 grid and cloud computing unit 5Cs6703 grid and cloud computing unit 5
Cs6703 grid and cloud computing unit 5
 
Migrating national services to the Cloud
Migrating national services to the CloudMigrating national services to the Cloud
Migrating national services to the Cloud
 
AWS&Deloitte Blockchain
AWS&Deloitte BlockchainAWS&Deloitte Blockchain
AWS&Deloitte Blockchain
 
How to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and TrustHow to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and Trust
 
Hybrid Cloud Solutions (with Datapipe)
Hybrid Cloud Solutions (with Datapipe)Hybrid Cloud Solutions (with Datapipe)
Hybrid Cloud Solutions (with Datapipe)
 
Multi-tenancy In the Cloud
Multi-tenancy In the CloudMulti-tenancy In the Cloud
Multi-tenancy In the Cloud
 
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYCAWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
AWS Partner Presentation - Datapipe - Deploying Hybrid IT, AWS Summit 2012 - NYC
 

Similar to Declare Victory with Big Data

Cloud strategy briefing 101
Cloud strategy briefing 101 Cloud strategy briefing 101
Cloud strategy briefing 101 Predrag Mitrovic
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
 
Cloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesCloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesAl Sabawi
 
Introduction To Cloud Computing By Beant Singh Duggal
Introduction To Cloud Computing By Beant Singh DuggalIntroduction To Cloud Computing By Beant Singh Duggal
Introduction To Cloud Computing By Beant Singh DuggalBeantsingh
 
Microservices for Application Modernisation
Microservices for Application ModernisationMicroservices for Application Modernisation
Microservices for Application ModernisationAjay Kumar Uppal
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAmazon Web Services
 
Financial impact of Cloud Computing
Financial impact of Cloud ComputingFinancial impact of Cloud Computing
Financial impact of Cloud Computingkrisbliesner
 
Todays_Cloud_Strategies_100818.pptx
Todays_Cloud_Strategies_100818.pptxTodays_Cloud_Strategies_100818.pptx
Todays_Cloud_Strategies_100818.pptxMOKTARBAKAR2
 
Karrox introduction to cloud computing
Karrox introduction to cloud computingKarrox introduction to cloud computing
Karrox introduction to cloud computingKarrox Franchise
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorialAnna Liu
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013adamnelson
 
Cloud Computing By Pankaj Sharma
Cloud Computing By Pankaj SharmaCloud Computing By Pankaj Sharma
Cloud Computing By Pankaj SharmaRanjan Kumar
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview Rajesh Menon
 
Cloud Computing - Security Benefits and Risks
Cloud Computing - Security Benefits and RisksCloud Computing - Security Benefits and Risks
Cloud Computing - Security Benefits and RisksWilliam McBorrough
 
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksStrata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksPaco Nathan
 
Cloud Computing Overview
Cloud Computing OverviewCloud Computing Overview
Cloud Computing OverviewManju Srinivas
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldDenodo
 

Similar to Declare Victory with Big Data (20)

Cloud strategy briefing 101
Cloud strategy briefing 101 Cloud strategy briefing 101
Cloud strategy briefing 101
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Cloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesCloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium Businesses
 
Introduction To Cloud Computing By Beant Singh Duggal
Introduction To Cloud Computing By Beant Singh DuggalIntroduction To Cloud Computing By Beant Singh Duggal
Introduction To Cloud Computing By Beant Singh Duggal
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
Microservices for Application Modernisation
Microservices for Application ModernisationMicroservices for Application Modernisation
Microservices for Application Modernisation
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 
Financial impact of Cloud Computing
Financial impact of Cloud ComputingFinancial impact of Cloud Computing
Financial impact of Cloud Computing
 
Todays_Cloud_Strategies_100818.pptx
Todays_Cloud_Strategies_100818.pptxTodays_Cloud_Strategies_100818.pptx
Todays_Cloud_Strategies_100818.pptx
 
Karrox introduction to cloud computing
Karrox introduction to cloud computingKarrox introduction to cloud computing
Karrox introduction to cloud computing
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
 
Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013Nairobi OpenStack Meetup - July 2013
Nairobi OpenStack Meetup - July 2013
 
Cloud Computing By Pankaj Sharma
Cloud Computing By Pankaj SharmaCloud Computing By Pankaj Sharma
Cloud Computing By Pankaj Sharma
 
Virtualization vs. Cloud Computing: What's the Difference?
Virtualization vs. Cloud Computing: What's the Difference?Virtualization vs. Cloud Computing: What's the Difference?
Virtualization vs. Cloud Computing: What's the Difference?
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Cloud Computing - Security Benefits and Risks
Cloud Computing - Security Benefits and RisksCloud Computing - Security Benefits and Risks
Cloud Computing - Security Benefits and Risks
 
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed FrameworksStrata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
 
Cloud Computing Overview
Cloud Computing OverviewCloud Computing Overview
Cloud Computing Overview
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud World
 

More from J On The Beach

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayJ On The Beach
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t HaveJ On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...J On The Beach
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoTJ On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsJ On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternJ On The Beach
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorJ On The Beach
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EEJ On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTingJ On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to failJ On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersJ On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every levelJ On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
 

More from J On The Beach (20)

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Declare Victory with Big Data

  • 1. DeclareVictory with Big Data Clemens Szyperski Principal Group Software Engineering Manager, Big Data Analytics Malaga, Spain May 2017
  • 2. Masters in EE: RWTH Aachen PhD in CS: ETH Zurich (under Niklaus Wirth) Post Doc: ICSI at UC Berkeley Co-founder of Oberon microsystems and Myriad Group Assoc/Prof CS at Queensland UT, Brisbane, Australia Architect, Developer, Lead, Manager,Group Manager in Research, Office, Connected Systems, Data Group at Microsoft since 1999 Who is Clemens? Languages: Oberon 2, Sather 2, Component Pascal, Mianjin, PQEL (M), SA-QL, U-SQL Systems: Ethos,Tenet 2, Gardens, Blackbox Component Builder, Project Oslo, .Net Managed Extensibility Framework, Power Query (in Excel & Power BI), Azure Stream Analytics, Azure Data Lake Analytics, Azure Time Series Insights
  • 3. BigData and Machine Learning /AI Big Data means  - a lot of data (volume)  - crazy shapes (variety)  - incoming!! (velocity) Analytics (incl. ML/AI) over Big Data means  - compute at massive scale  - complexity  - fault tolerance XKCD license
  • 4. The Big Data Explosion Data Complexity (variety and velocity) TB GB EB … PB Big Data Log files Spatial & GPS coordinates Data market feeds eGov feeds Weather Text/image Click stream Wikis/blogs Sensors/RFID/ devices Social sentiment Audio/video Web 2.0 Web Logs Digital Marketing Search Marketing Recommendations Advertising Mobile Collaboration eCommerce Payables Payroll Inventory Contacts Deal Tracking Sales Pipeline ERM CRM Data Size yotta Y 10008 1024 septillion zetta Z 10007 1021 sextillion exa E 10006 1018 quintillion peta P 10005 1015 quadrillion tera T 10004 1012 trillion giga G 10003 109 billion mega M 10002 106 million
  • 5. CLOUD MOBILE Growth of data INTERNET CONNECTED DIGITAL ANALOG 1985 1990 1995 2000 2005 2010 2015 2020
  • 6. All data generated Schema agility AND experimentation AND ML, image Processing, graph, streaming Operational Data Highly modeled schema Relational algebra
  • 7. Examples: Big Data at Microsoft Cosmos and Scope - - Rooted in Dryad - -A decade of development Productized as Azure Data Lake - -ADL Store - -ADLAnalytics with U-SQL Kafka Four Commas Club (Ingestion of aTrillion+ Events/Day) Cosmos stores exabytes of active data Scope processes hundred of petabytes a day Supports batch, interactive, streaming, ML Data ranges across most Microsoft products Bing and MSN click streams Office and Windows telemetry Xbox gaming Cosmos comprises hundreds of thousands of machines millions of cores petabytes of RAM exabytes of disks Still only a fraction of the global Microsoft Cloud
  • 8. Demo Applying ML and Cognition at Scale
  • 9. How can you leverage Big Data? Use the Power of Public Cloud Services to move beyond - hardware lifecycles - infrastructure management - physical and cyber security infrastructure - inflexible demand/scale/cost structures - inadequate geo reach Well, it can be fairly simple, actually
  • 10. FullService Family of services designed for composition is called Platform as a Service  Contrast with low-level building blocks (VMs, storage, network): Infrastructure as a Service  Contrast also with finished solution services: Software as a Service Composition units are fully deployed and operated instances  Contrast this with source reuse and with software components  Litmus test: Can you ask “who pays the power bill?” Cloud services allow users to shed the cost of operations, enable user to stay on top of software and hardware trends, and virtualize physical resources of extreme capacity.
  • 11. Who doesWhat Note that Private clouds aim for many of the same, but on top of resources deployed and controlled on a customer’s premises. Azure Stack enables such private clouds while retaining much of the Azure model. Key CloudValue Proposition: Separation of Responsibilities Applications Data Runtime Middleware OS Virtualization Servers Storage Networking Applications Data Runtime Middleware OS Virtualization Servers Storage Networking Applications Data Runtime Middleware OS Virtualization Servers Storage Networking Applications Data Runtime Middleware OS Virtualization Servers Storage Networking On Premises Infrastructure (as a Service) Platform (as a Service) Software (as a Service) CustomerManages ProviderManages
  • 12. ServiceQuality Security, Privacy Tenancy Model Performance Predictability Scale vs. Price Result Semantics Management Geographic and Geopolitical Reliability There are many qualities that characterize a service. The total design space for services is overconstrained if all qualities are equally important. Understanding tradeoffs is thus essential – and defines the service engineering discipline.
  • 13. Security, Privacy • Encryption • Data at rest (or always) • Service- vs. user-managed keys • Authentication • Two-factor authentication • Azure Active Directory (AAD) integration • Federation with other providers and on-premises systems (AD) • Authorization • Role definitions, user-to-role assignments (RBAC) • Managed shareable access keys (SAS,OAuth) • DOS Protection, Attack and Intrusion Detection • Network isolation • All IP endpoints that are internal to a solution, on premise or in cloud, are not exposed on the Internet (VNet’s: virtual networks) Is the data secured? Is the solution IP secured? Is the service quality secured?
  • 14. Performance Predictability Spectrum of predictability based on willingness to pay • Static pre-allocation of fully dedicated resources • Based on conservative static calculation of resource needs • Based on requested resources – typically based on estimates grounded in historic observation • Either fails early or will run with promised resources • Can be undermined by failures of underlying infrastructure • Dynamic allocation of needed resources • Risk of out-of-resource rejection in mid course • Dynamic sharing of resources • Risk of noisy neighbor impact • Can use quota enforcement policies to keep individual resource consumers within bounds • Typically subject to overbooking policies to ensure high level of resource utilization (and thus delivery at low cost) Common performance metrics: Throughput – data processed per time unit. Latency – time between earliest possible and actual delivery of results. Metrics are usually observed relative to benchmark or actual workloads.
  • 15. Tenancy Model Single-Tenant Services • Dedicated and isolated resources are granted to a tenant • Example: dedicated clusters (VM sets with cluster-level features for management, monitoring, etc.) Multi-Tenant Services • Resources are shared among tenants – simpler and cheaper • Example: job-execution service • Tenant isolation is fine-grained • Security bar is more difficult to uphold • Predictable Performance often impacted: “noisy neighbors” • Performance irregularities caused by heavy use of shared resources by some other users A tenant is a logical customer organization. Multiple users may authenticate under the same tenant. A large customer may have multiple tenancies.
  • 16. Scale vs. Price One Size will never Fit All • Optimizing for most desirable qualities (high availability, reliability, predictability, security, …) will counteract optimizing for price Average vs. Peak • Max resource envelope scale (like dedicated clusters) • Max job envelope scale (auto- resourcing per job submission) • Actual job envelope scale (elastic resourcing over duration of job) • Likely discretized to max. elastic adaptation rate Price Sensitivity of a customer depends on the value that a solution generates for the customer. Worst case example: solution is required by law but does not generate business value. Best case example: solution itself is a high-margin product sold by customer. workload time
  • 17. Result Semantics Many possible models for “good results” Deterministic results • Given inputs and chosen service operations fully determine results Repeatable results • While not fully pre-determined, rerunning a service request over the same inputs will yield the same results (e.g. journaling of tie breaks) Asymptotic results • Over time, a service operation will yield closer approximations of the ideal results (e.g., eventual consistency) Best effort • For some definition of effort, a service makes a best effort to yield the desired result, but always returns the result it came up with Time boxed • Special best-effort case: do the best within an allotted time bound A valid result of a service request is one that meets the requestor’s requirements. Ideal definitions of validity (such as mathematical ones) are often “overkill” and sometimes unattainable. Practical definitions create a high- dimensional engineering space.
  • 18. Composing Solutions Security, Privacy Tenancy Model Performance Predictability Scale vs. Price Result Semantics Management Geographic and Geopolitical Reliability A closed solution can meet many requirements by fiat. A composed solution can still be closed (hide its composition). An openly composed solution exposes its composition; here, meeting many requirements is hard.
  • 19. Composition Composing solutions over cloud platform services  In-house or Solution Integrator  Cloud-only or hybrid  Hide cloud platform as implementation detail  Independent SolutionVendor  As above, but in addition:  Create multi-tenancy solution on top of platform  Create billing models, incl. abstraction of platform bills  Independent PlatformVendor  As above, but in addition:  Create new platform abstractions over existing ones  A platform is open for third-party contributions (extensions) and solutions built atop  Enabling independent platform construction atop creates many hard transitive challenges Composition takes components as is and assembles them into larger pieces. This is different from lower-level forms of source-code reuse. For service composition, lower-level forms are definitely excluded.
  • 20. How to Encode Compositions? Hey, Clemens, why can’t I just encode all this stuff in … … Python … Scala … Go … R … C# … Java … whatnot? I mean, composition is just programming after all, no?
  • 21. Composing using Languages Instructions can be very low-level (close to the machine’s primitive operations) Instructions can be very high-level (close to the problem domain at hand) Most languages strike a balance  Too low-level (limits audience, limits target machines)  Too high-level (limits audience, limits problem domains) Given a computer with some primitive operations and a problem to solve. Formulate a composition of instructions to the computer that solve the problem. Skills Interest Audience Machine Specific Domain Specific “General Purpose”
  • 22. Audience- Specific Languages Languages that strive to be “general purpose” end up being not quite right at most anything. To compensate, such languages develop a large arsenal of specialized but overlapping capabilities. The ideal maximized audience is subdued by complexity. Larger audiences can be served with simpler languages to either side of the “general purpose” point. Consider a variety of personas that characterize how groups of people get their tasks done. Consider a set of personas that fall into comparable needs/skills categories.Call that an audience. Skills Interest Audience Complexity Machine Specific Domain Specific “General Purpose” “Audience Specific”
  • 23. Domain- Specific Languages Embedded or internal DSLs were the latest craze for a while  Language-embedded Query (LINQ) is a popular example  Functional monadic query operators embedded as an expression sub-language in general-purposeC#, even with its own syntax Analyzability (static or dynamic) suggests the more limited language be on the outside  Opposite of LINQ style of languages (that embed a functional DSL inside a general-purpose programming language)  Example: U-SQL language that is essentiallyT-SQL DQL as the outer layer andC# as the inner layer DSLs come in two common shapes: internal or embedded and external. An internal DSL is embedded inside a general-purpose language. An external DSL is its own top-level entity. Oddly, it may embed a general-purpose language.
  • 24. The Power of Declaration – Examples Azure Resource Manager (ARM) templates  Declarative composition of resources across services  Repeatable deployment of solutions build over Azure platform services Power Query expressions for Excel and Power BI  Functional composition of dataflow across many data sources  Dynamic analysis – pushes nested work out to smart data sources (like databases) Azure Stream Analytics jobs  Declarative job definition  Functional composition of dataflow from N sources to M destinations  Static analysis – guarantees repeatable, at-least-once results from streaming jobs U-SQL scripts for Azure Data Lake  Declarative job definition  Functional composition of distributed & federated dataflow, incl. custom code  Static analysis – determines distribution of work / federation of nested work  Dynamic analysis – determines affinity of compute, failure masking tactics Declarative Languages establish the shape of the result in a form fully amenable to static analysis (and comprehension). For functional programming folks: think of a fixed universe of higher- order functions that are “understood” by the system (and usually have distinctive syntax)
  • 25. U-SQL as an exampleof declarative power U-SQL scripts for Azure Data Lake • Cost Compile-time partition elimination • Predictable execution Compile-time per-vertex memory determination Compiler (and Optimizer, Code Gen) runs for every submission • Performance Optimizer-time plan building for scale-out and staged pipelining All-of-topology optimization (not just all-of-stage) Example: predicates pushed through (r/o annotated) custom code Native code gen around arbitrary custom code • Security Compile-time separation of trusted from untrusted (custom) code, deployment into segregated containers Declaration of all-of-toplogy semantics as a dataflow graph Hosting of custom code in well- defined roles inside that graph
  • 26. AzureData Lake Analytics U-SQLScripts U-SQL  Unifies natively SQL’s declarative nature and C#’s general power  Metadata service keeps “assembly” definitions  Assembly: collection of uploaded files  Custom code is at least a .NET assembly with a public method  Custom code can then spawn processes and load other code; for example: spin up a Python runtime with libraries and run a Python script  Built-in support for R and Python as well as a range of cognitive functions  Unifies querying structured and unstructured data  Unstructured data: Schema on read  Structured data: Metadata service keeps schema  Unifies local (distributed) and remote (federated) queries  FederateT-SQL queries to Azure SQL DB, Azure SQL DW, or to SQL Server onVMs A U-SQLScript uses an outer language ofT-SQL (DDL and DQL) to host an inner imperative language (C#). The outer declarative language is used to automatically scale and parallelize the inner islands of custom code.
  • 27. U-SQL Extensibility Extensibility at many levels, capturing semantic intent  C# expressions in SELECT statements  User-defined functions (UDF’s)  User-defined aggregates (UDAgg’s)  User-defined operators (UDO’s) – several kinds Remember:T-SQL DML/DQL on the outside, C# on the inside. C# abstractions are also the basis for extensibility.
  • 28. User-Defined Operators User-Defined Extractors  Extract streams of rows from input sources User-Defined Outputters  Serialize results and send to output targets User-Defined Processors  Take one row and produce one row  Pass-through versus transforming User-Defined Appliers  Take one row and produce 0 to n rows  Used with OUTER/CROSS APPLY User-Defined Combiners  Combines rowsets (like a user-defined join) User-Defined Reducers  Take n rows and produce 1 row Scaled out with explicit U-SQL Syntax that takes a UDO instance (created as part of the execution): EXTRACT OUTPUT PROCESS COMBINE REDUCE
  • 29. U-SQL Metadata Object Model ADLA Account/Catalog Database Schema [1,n] [1,n] [0,n] Tables Views TVFs C# Fns C# UDAgg Clustered Index Partitions C# Assemblies C# Extractors Data Source C# Reducers C# Processors C# Combiners C# Outputters External tables Abstract objects User objects Refers toContains Implemented and named by Procedures Creden- tials MD Name C# Name C# Appliers Table Types Legend Statistics C# UDTs Other resources
  • 30. U-SQL Language Construction Declarative Query and Transformation Language: • Uses SQL’s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions • Optimizable, Scalable Expression-flow programming style: • Easy to use dataflow composition • Composable, globally optimizable Operates on Unstructured & Structured Data • Schema on read over files • Relational metadata objects (e.g. database, table) Extensible from the ground up: • Type system is based on C# • Expression language IS C# • User-defined functions (U-SQL and C#) • User-defined Aggregators (C#) • User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode • EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER Federated query across external data sources REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j;
  • 31. Designed for MassiveScale JOIN operators INNER JOIN LEFT, RIGHT, or FULL OUTER JOIN CROSS JOIN SEMIJOIN equivalent to IN subquery ANTISEMIJOIN equivalent to NOT IN subquery Language constraints steer user towards parallelizable patterns • ON clause comparisons need to be of the simple form (“equijoin”): rowset.column == rowset.column or AND conjunctions of the simple equality comparison • If a comparand is not a column, wrap it into a column in a previous SELECT • If the comparison operation is not ==, put it into the WHERE clause • Turn the join into a CROSS JOIN if no equality comparison U-SQL is the product form of the Microsoft-internal Scope. Runs big parts of the business on hundreds of thousands of machines. Single jobs easily expand to run on thousands of machines. The U-SQL language is constrained to steer users towards patterns that can be parallelized. Example here: Joins.
  • 32. U-SQL Compilation Run before every execution to leverage actual input data characteristics. Analysis of entireU-SQL script as well as metadata from data sources. Elimination of empty partitions. Splitting into pipelineable steps.
  • 33. U-SQL Optimization Run after every compilation. Builds physical execution graph. Groups pipelineable steps into stages. Stages are scaled out to execute over a chosen number of vertices, influenced by input sharding, stages before and after. Per-job and user-driven level of parallelization. Tooling: Detailed visibility into execution steps, for debugging. Heatmap like functionality to identify performance bottlenecks.
  • 34. Q &A (Yes, we areHiring!)

Editor's Notes

  1. The power of top-level declarative Not just all-of-stage, but all-of-topology optimization Example: push predicate upstream, even through (certain) UDO’s Toplogy is computed by the system, not programmed by the developer (as it is in Storm, Spark, etc.) Both static and dynamic optimization Sizing of resources to meet capacity needs more predictable   The power of code inside defined “islands” within declarative topology UDx as points of defined extensibility Top-level coupling of C# into U-SQL Nested support for Python, R, Java   Thus: bring your favorite library, written for your favorite language requiring your favorite runtime – and enjoy! Some caveats … Flow Intro -> Big Data / Analytics -> First Demo -> Declarative + Code Power -> Q&A
  2. Show 2.2PB ADLS account with demo data for customer support scenario (call center logs in the 100’s of gigs, plus related social media and telemetry logs) Show analysis over PB’s of data using U-SQL show solution “monitor” page show 2PB ADLS dashboard show call center log in file preview switch to VM, show USQL and R snippets, incl. simple job graph, replay it Switch to VS, show telemetry jobs, pick 2h running job, show job graph and replay it, show issues analysis (data skew), click link to help page
  3. Use for language experts