SlideShare a Scribd company logo
1 of 32
Download to read offline
Big Data on Tap
cask.co
rule Distributed-Rules-Engine-DRE {

description ‘Presentation of Distributed Rules Engine(DRE)’

when(presenter ~= “Nitin Motgi” && tile ~= “Co-founder & CTO” &&
datetime = today())

then {

welcome; 

present;

question-and-answer;

}

}
2
Who is Cask
AT&T, Cloudera and Ericsson 

Strategic Investors
First Unified Integration Platform for
rapid time-to-value from Big Data
Unique Value Proposition
AT&T, Cloudera, Ericsson, Hortonworks, IBM,
MapR, Microsoft, Salesforce, Thomson Reuters, …
Key Customers & Partners
By early Hadoop engineers from
Facebook and Yahoo!
Founded in 2011
Andreessen Horowitz, Safeguard,
Battery Venture and Ignition Partners
Raised $37+ Million
Featuring Cask Market,

the “big data app store”
Mature Platform: CDAP 4
A Container Architecture that puts
Big Data on Tap
Why “Cask” ?
3
What is Cask Data Application Platform (CDAP)
Runtime
A unified platform for

building integrated data analytics tools and applications and 

delivering specialized frameworks and solutions that

enable enterprises to rapidly extract value from data
APIs Tools Frameworks Market
4
Pre-Built Tools and Frameworks
Data Prep — Framework
Data Pipeline — Framework
• Data Preparation for on-boarding new sources and datasets
• Perform Data Transformations, Data Quality checks with visual
feedback
• Extend the Data Prep by building new user defined directives
• Integrates with Data Pipeline for operationalizing transformations
• User interface for building complex data workflows
• Join, Lookup, Aggregate, Filtering data in-flight
• Building complex workflows with 100s of connectors
• Extend Data Pipeline using simple APIs
• Integrates with Data Prep, Rules Engine and Metadata Aggregator
5
Pre-Built Tools and Frameworks
Rules Engine — Tool
Metadata Aggregator — Tool
• Business Data Transformations and checks codified for business users
• Define Complex rules using intuitive and simple to use user interface
• Logically group Rules in Rulebook and trigger or schedule
processing.
• Integrate with Data Pipeline for operationalizing Rules
• Aggregate Business, Technical and Operational Metadata
• Track the flow of data (Lineage) for richer data needed for governance
• Create Data Dictionary and Metadata Repository
• Integrate with enterprise MDM solutions.
• Integrates with Data Pipeline, Rules Engine
6
Pre-Built Tools and Frameworks
Microservice — Framework
Event Condition Action (ECA) — Application
• Build specialized logic for processing data
• Create loosely coupled network for processing events
• Connect them using Amazon SQS, Websocket, MapR Streams and
Kafka
• Delivers a specialized solutions for IoT event processing
• Parses any events, triggers conditions and executes Action.
• Real-time notification system, with easy-to-use user interface for
configuring event parsing, condition and actions
7
Realizing Value from Data
Empower
Business Users
On-Board New
Data Sources and
Types
Integrate with
Cloud and
Enterprise
Ecosystem
Build Data
Processing
Workflows
Integrate
Machine Learning
and Model
Management
Expand and
Customize to
build new Data
Analytics
Applications
Data is
Your Asset
Cloud Connectors +
Run on Public Cloud +
Integrate with Existing Apps
Platform
Automate
Operationalize
IntelligenceIntegration
Extend & Innovate
Business Value
ML Integration
Data Prep
Rules Engine
Data Pipelines
8
What is a Rules Engine?
“Externalize the business
logic that is dynamic”
“Make code Intuitive and readable,
easily understood by business people”
“Simplify complicated requirements with the
declarative logic, raise level of abstraction”
9
Why Rules Engine?
Rules Engines are a great way to collect all complex
decision-making logic involved in Data Integration
and work with large datasets
10
When to use a Rules Engine?
“When transformation logic changes often or there
are constant on-boarding feeds into the Data Lake”
“Want to involve non-technical domain experts in
Ingestion and Integration of Big Data”
“Separate code and logic”
11
Current Data Transformation Flow
12
Introducing Cask Distributed Rules Engine (Cask DRE)
13
Data Transformation Flow with Cask DRE
• Involve business analysts or non-
technical domain experts in Big
Data
• Transform data using declarative
language instead of imperative
• Centralize data transformation
into a self-documented
knowledge base.
• Real-time or Batch - Do it exactly
the same way at scale on Hadoop
or Spark
• Integrate with workflow to
operationalize at scale.
14
Components of Cask DRE
15
Rule Execution
age vin model
32 … van
45 … sports
16 … truck
17 … sports
19 … sedan
13 … van
56 … sedan
46 … sports
28 … van
Input Table
Record is a individual row in the
table that is made up of columns.
Each cell contains the value
associated with column.
Working Set DRE
Rule Repository
Record is considered as
working set on which the DRE
would operate on
DRE updates working set based
on the Actions of Rules fired
Outcome
16
Benefits of Cask DRE
Cask Distributed Rules Engine (DRE) is a tool that helps implement
a production rule system with forward chaining that
Makes it easier to program data transformations for Big Data
Enables non-technical domain experts
Integration with Apache
Spark/Spark Streaming
& MapReduce
Insights into Rule
Execution
~ 1K pre-built
Actions
Interactive
User Interface
1000
17
Details of DRE
• Records represented as in-memory working set
• Set of Rules that declaratively define conditions
• Actions executed or inference derived based on the rules.
• Planner to execute Rules in map() phase or reduce() phase.
Forward Chaining
Starts with data or records and
triggers actions or generate
outcome — Target State.
18
Rules
Allows users to specify the requirements or knowledge
about processing using
— Declarative (say what should happen, not how to do it),
— Logic based languages.
19
Rules
When condition is satisfied
based on data in working set.
Condition Action
Then, apply the actions on the
data in the working set.
If dark, then turn on the lights
If thirsty, then drink some water
If sleepy, then go to bed
20
Anatomy of a Rule
rule <id> {

description ‘<description>’

when(<LHS>)

then {

<RHS>

}

} When LHS is satisfied, RHS is executed
Version of Rulebook - Any modifications
in Rule will update Rulebook version
Description what actually the
rule is.
Defines the condition to be met
or satisfied before action are
executed
Actions to be executed when LHS is
satisfied to true.
21
Example of a Rule
rule Minimum-Age {

description ‘Increase the premium by 15% if driver is less than 25 and drives a sports car’

when(age < 25 && model == “sports”)

then {

set-column premium (premium + premium * 0.15)

}

}
The following Rule increases the premium by 15% when the driver’s age is less than
25 and drives a sports car.
When a record has age less than 25 and model is sports, the action to generate is triggered
22
Rule Condition
Condition
23
Rule Patterns
rule Minimum-Age {

description ‘Filter Records where the age of person is less than 17’

when(age < 17)

then {

filter-row-if-true true;

}

}
Following Rule rejects or sends them to error who’s age is less than 17
Conditions support <, >, ==, <=, >=, matches / not matches, contains / not contains.
Following Rule provides a discount if customer is married and is older than 25.
rule Discount {

description ‘Provide discount if customer is married and is older than 25’

when(married == true && age > 25)

then {

set-column discount 10;

}

}
24
Rule Actions
• Any Data Prep Directive — Micro Data Transformation
Instructions
• Operations on working set
• Update Column — Will tell the engine the column has
changed
• Insert new Column — Will add a new column to the working
set
• Delete Column — Will remove a column from the working set
• Stateful - Temporary variables available across Rules.
rule <Rule-Id> {

…

then {

<data prep directive>;

<data prep directive>;

<data prep directive>;

<data prep directive>;

}

}
25
Rule Example 1
If dataset contains an SSN field, always mask the SSN field to preserve only last four
digits of SSN
rule Mask-SSN {

description ‘Mask SSN to format xxx-xx-####’

when(present(ssn))

then {

mask-number ssn xxx-xx-####

}

}
26
Rule Example 2
If a work location defined in the dataset being processed is not one of the allowed locations or
location is empty — send the record to error to be collected for investigation
rule Location-Validator {

description ‘Check to see the work location state is empty or one of the valid states.’

when(present(workloc) && !dq:isEmpty(workloc) && (!(workloc() =~ [ 'AK', 'AL', 'AR', 'AZ',
'CA', ]) ))

then {

send-to-error true

}

}
27
Rule Example 3
Check if Subscriber id is a number and is of length 8. If invalid Subscriber id, filter the record
out of processing
rule SubscriberId-Validator {

description ‘Validate Subscriber Id.’

when(present(subscriberId) && !dq:isNumber(subscriberId) && subscriberId.length != 8)

then {

filter-record-if-true true

}

}
28
Rule Example 4
If customer is above 40 and has purchased more than 10 items from Pantry, then give them
9.8% discount it from the total price they pay.
rule Customer-Pantry-Discount {

description ‘Customer 9.8% discount, when age above 40, purchased 10 items from pantry’

when(age > 40 && items > 10)

then {

set-column discount 9.8

set-column net_amount (amount - (amount * 0.098))

}

}
29
Anatomy of a Rulebook
Specifies an Id for the Rulebook
Version of Rulebook - Any modifications
in Rule will update Rulebook version
Metadata to defines additional
information for Rulebook.
Collection of Rules
rulebook <Id> {

version <number>

meta {

description ‘<description>’

created-date <date>

updated-date <date>

source ‘<source>’

user ‘<user>’

}

<rule>

<rule>

<rule>

. . .

<rule>

}
30
Demo
Do we want to provide a few bullets of what pieces will be
demo’ed (similar to last presentation)?
31
Technical Webinar Series
Live Technical Webinar Series: Moving Big Data Forward with Cask
RSVP at: https://cask.co/company/events
Oct 31, 2017 @ 11am PT / 2pm ET
Watch on demand @ 

cask.co/resources/webinars/
Oct 5, 2017 @ 11am PT / 2pm ET Oct 19, 2017 @ 11am PT / 2pm ET
32
To learn more, go to cask.co
or contact us at info@cask.co
Questions?

More Related Content

What's hot

Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Data Con LA
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLDataWorks Summit
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudRick Bilodeau
 
Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Komes Chandavimol
 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveDataWorks Summit
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata IntegrationDr. Mirko Kämpf
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...DataWorks Summit
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Data Con LA
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Pat Patterson
 

What's hot (20)

Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
 

Similar to Introducing a horizontally scalable, inference-based business Rules Engine for Big Data processing

Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligenceAhsan Kabir
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationNordic APIs
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationJerod Johnson
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10Chris Huang
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
 
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Skender Kollcaku
 
SSRS RLS Prototype | Vision and Scope Document
SSRS RLS Prototype | Vision and Scope Document  SSRS RLS Prototype | Vision and Scope Document
SSRS RLS Prototype | Vision and Scope Document Ryan Casey
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streamingAdam Doyle
 
RLS Prototype ETL | Vision and Scope Document
RLS Prototype ETL | Vision and Scope DocumentRLS Prototype ETL | Vision and Scope Document
RLS Prototype ETL | Vision and Scope DocumentRyan Casey
 
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"Pavel Hardak
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldDatabricks
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBIntroducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBMongoDB
 
application-template-deployment-guide.pdf
application-template-deployment-guide.pdfapplication-template-deployment-guide.pdf
application-template-deployment-guide.pdfamazon4it
 

Similar to Introducing a horizontally scalable, inference-based business Rules Engine for Big Data processing (20)

Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligence
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API Integration
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API Integration
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
 
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
 
SSRS RLS Prototype | Vision and Scope Document
SSRS RLS Prototype | Vision and Scope Document  SSRS RLS Prototype | Vision and Scope Document
SSRS RLS Prototype | Vision and Scope Document
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
RLS Prototype ETL | Vision and Scope Document
RLS Prototype ETL | Vision and Scope DocumentRLS Prototype ETL | Vision and Scope Document
RLS Prototype ETL | Vision and Scope Document
 
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world""Lessons learned using Apache Spark for self-service data prep in SaaS world"
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
 
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS WorldLessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBIntroducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
 
application-template-deployment-guide.pdf
application-template-deployment-guide.pdfapplication-template-deployment-guide.pdf
application-template-deployment-guide.pdf
 

More from Cask Data

Transaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, CaskTransaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, CaskCask Data
 
Building Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillBuilding Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillCask Data
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBaseCask Data
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...Cask Data
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Cask Data
 
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015Cask Data
 
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagBrown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagCask Data
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25Cask Data
 

More from Cask Data (9)

Transaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, CaskTransaction in HBase, by Andreas Neumann, Cask
Transaction in HBase, by Andreas Neumann, Cask
 
Building Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillBuilding Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache Twill
 
Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBase
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
 
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
 
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagBrown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 

Introducing a horizontally scalable, inference-based business Rules Engine for Big Data processing

  • 1. Big Data on Tap cask.co rule Distributed-Rules-Engine-DRE { description ‘Presentation of Distributed Rules Engine(DRE)’ when(presenter ~= “Nitin Motgi” && tile ~= “Co-founder & CTO” && datetime = today()) then { welcome; present; question-and-answer; } }
  • 2. 2 Who is Cask AT&T, Cloudera and Ericsson Strategic Investors First Unified Integration Platform for rapid time-to-value from Big Data Unique Value Proposition AT&T, Cloudera, Ericsson, Hortonworks, IBM, MapR, Microsoft, Salesforce, Thomson Reuters, … Key Customers & Partners By early Hadoop engineers from Facebook and Yahoo! Founded in 2011 Andreessen Horowitz, Safeguard, Battery Venture and Ignition Partners Raised $37+ Million Featuring Cask Market,
 the “big data app store” Mature Platform: CDAP 4 A Container Architecture that puts Big Data on Tap Why “Cask” ?
  • 3. 3 What is Cask Data Application Platform (CDAP) Runtime A unified platform for
 building integrated data analytics tools and applications and 
 delivering specialized frameworks and solutions that
 enable enterprises to rapidly extract value from data APIs Tools Frameworks Market
  • 4. 4 Pre-Built Tools and Frameworks Data Prep — Framework Data Pipeline — Framework • Data Preparation for on-boarding new sources and datasets • Perform Data Transformations, Data Quality checks with visual feedback • Extend the Data Prep by building new user defined directives • Integrates with Data Pipeline for operationalizing transformations • User interface for building complex data workflows • Join, Lookup, Aggregate, Filtering data in-flight • Building complex workflows with 100s of connectors • Extend Data Pipeline using simple APIs • Integrates with Data Prep, Rules Engine and Metadata Aggregator
  • 5. 5 Pre-Built Tools and Frameworks Rules Engine — Tool Metadata Aggregator — Tool • Business Data Transformations and checks codified for business users • Define Complex rules using intuitive and simple to use user interface • Logically group Rules in Rulebook and trigger or schedule processing. • Integrate with Data Pipeline for operationalizing Rules • Aggregate Business, Technical and Operational Metadata • Track the flow of data (Lineage) for richer data needed for governance • Create Data Dictionary and Metadata Repository • Integrate with enterprise MDM solutions. • Integrates with Data Pipeline, Rules Engine
  • 6. 6 Pre-Built Tools and Frameworks Microservice — Framework Event Condition Action (ECA) — Application • Build specialized logic for processing data • Create loosely coupled network for processing events • Connect them using Amazon SQS, Websocket, MapR Streams and Kafka • Delivers a specialized solutions for IoT event processing • Parses any events, triggers conditions and executes Action. • Real-time notification system, with easy-to-use user interface for configuring event parsing, condition and actions
  • 7. 7 Realizing Value from Data Empower Business Users On-Board New Data Sources and Types Integrate with Cloud and Enterprise Ecosystem Build Data Processing Workflows Integrate Machine Learning and Model Management Expand and Customize to build new Data Analytics Applications Data is Your Asset Cloud Connectors + Run on Public Cloud + Integrate with Existing Apps Platform Automate Operationalize IntelligenceIntegration Extend & Innovate Business Value ML Integration Data Prep Rules Engine Data Pipelines
  • 8. 8 What is a Rules Engine? “Externalize the business logic that is dynamic” “Make code Intuitive and readable, easily understood by business people” “Simplify complicated requirements with the declarative logic, raise level of abstraction”
  • 9. 9 Why Rules Engine? Rules Engines are a great way to collect all complex decision-making logic involved in Data Integration and work with large datasets
  • 10. 10 When to use a Rules Engine? “When transformation logic changes often or there are constant on-boarding feeds into the Data Lake” “Want to involve non-technical domain experts in Ingestion and Integration of Big Data” “Separate code and logic”
  • 12. 12 Introducing Cask Distributed Rules Engine (Cask DRE)
  • 13. 13 Data Transformation Flow with Cask DRE • Involve business analysts or non- technical domain experts in Big Data • Transform data using declarative language instead of imperative • Centralize data transformation into a self-documented knowledge base. • Real-time or Batch - Do it exactly the same way at scale on Hadoop or Spark • Integrate with workflow to operationalize at scale.
  • 15. 15 Rule Execution age vin model 32 … van 45 … sports 16 … truck 17 … sports 19 … sedan 13 … van 56 … sedan 46 … sports 28 … van Input Table Record is a individual row in the table that is made up of columns. Each cell contains the value associated with column. Working Set DRE Rule Repository Record is considered as working set on which the DRE would operate on DRE updates working set based on the Actions of Rules fired Outcome
  • 16. 16 Benefits of Cask DRE Cask Distributed Rules Engine (DRE) is a tool that helps implement a production rule system with forward chaining that Makes it easier to program data transformations for Big Data Enables non-technical domain experts Integration with Apache Spark/Spark Streaming & MapReduce Insights into Rule Execution ~ 1K pre-built Actions Interactive User Interface 1000
  • 17. 17 Details of DRE • Records represented as in-memory working set • Set of Rules that declaratively define conditions • Actions executed or inference derived based on the rules. • Planner to execute Rules in map() phase or reduce() phase. Forward Chaining Starts with data or records and triggers actions or generate outcome — Target State.
  • 18. 18 Rules Allows users to specify the requirements or knowledge about processing using — Declarative (say what should happen, not how to do it), — Logic based languages.
  • 19. 19 Rules When condition is satisfied based on data in working set. Condition Action Then, apply the actions on the data in the working set. If dark, then turn on the lights If thirsty, then drink some water If sleepy, then go to bed
  • 20. 20 Anatomy of a Rule rule <id> { description ‘<description>’ when(<LHS>) then { <RHS> } } When LHS is satisfied, RHS is executed Version of Rulebook - Any modifications in Rule will update Rulebook version Description what actually the rule is. Defines the condition to be met or satisfied before action are executed Actions to be executed when LHS is satisfied to true.
  • 21. 21 Example of a Rule rule Minimum-Age { description ‘Increase the premium by 15% if driver is less than 25 and drives a sports car’ when(age < 25 && model == “sports”) then { set-column premium (premium + premium * 0.15) } } The following Rule increases the premium by 15% when the driver’s age is less than 25 and drives a sports car. When a record has age less than 25 and model is sports, the action to generate is triggered
  • 23. 23 Rule Patterns rule Minimum-Age { description ‘Filter Records where the age of person is less than 17’ when(age < 17) then { filter-row-if-true true; } } Following Rule rejects or sends them to error who’s age is less than 17 Conditions support <, >, ==, <=, >=, matches / not matches, contains / not contains. Following Rule provides a discount if customer is married and is older than 25. rule Discount { description ‘Provide discount if customer is married and is older than 25’ when(married == true && age > 25) then { set-column discount 10; } }
  • 24. 24 Rule Actions • Any Data Prep Directive — Micro Data Transformation Instructions • Operations on working set • Update Column — Will tell the engine the column has changed • Insert new Column — Will add a new column to the working set • Delete Column — Will remove a column from the working set • Stateful - Temporary variables available across Rules. rule <Rule-Id> { … then { <data prep directive>; <data prep directive>; <data prep directive>; <data prep directive>; } }
  • 25. 25 Rule Example 1 If dataset contains an SSN field, always mask the SSN field to preserve only last four digits of SSN rule Mask-SSN { description ‘Mask SSN to format xxx-xx-####’ when(present(ssn)) then { mask-number ssn xxx-xx-#### } }
  • 26. 26 Rule Example 2 If a work location defined in the dataset being processed is not one of the allowed locations or location is empty — send the record to error to be collected for investigation rule Location-Validator { description ‘Check to see the work location state is empty or one of the valid states.’ when(present(workloc) && !dq:isEmpty(workloc) && (!(workloc() =~ [ 'AK', 'AL', 'AR', 'AZ', 'CA', ]) )) then { send-to-error true } }
  • 27. 27 Rule Example 3 Check if Subscriber id is a number and is of length 8. If invalid Subscriber id, filter the record out of processing rule SubscriberId-Validator { description ‘Validate Subscriber Id.’ when(present(subscriberId) && !dq:isNumber(subscriberId) && subscriberId.length != 8) then { filter-record-if-true true } }
  • 28. 28 Rule Example 4 If customer is above 40 and has purchased more than 10 items from Pantry, then give them 9.8% discount it from the total price they pay. rule Customer-Pantry-Discount { description ‘Customer 9.8% discount, when age above 40, purchased 10 items from pantry’ when(age > 40 && items > 10) then { set-column discount 9.8 set-column net_amount (amount - (amount * 0.098)) } }
  • 29. 29 Anatomy of a Rulebook Specifies an Id for the Rulebook Version of Rulebook - Any modifications in Rule will update Rulebook version Metadata to defines additional information for Rulebook. Collection of Rules rulebook <Id> { version <number> meta { description ‘<description>’ created-date <date> updated-date <date> source ‘<source>’ user ‘<user>’ } <rule> <rule> <rule> . . . <rule> }
  • 30. 30 Demo Do we want to provide a few bullets of what pieces will be demo’ed (similar to last presentation)?
  • 31. 31 Technical Webinar Series Live Technical Webinar Series: Moving Big Data Forward with Cask RSVP at: https://cask.co/company/events Oct 31, 2017 @ 11am PT / 2pm ET Watch on demand @ 
 cask.co/resources/webinars/ Oct 5, 2017 @ 11am PT / 2pm ET Oct 19, 2017 @ 11am PT / 2pm ET
  • 32. 32 To learn more, go to cask.co or contact us at info@cask.co Questions?