SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Mirko Bernardoni & Michael Seddon
Clifford Chance
Revolutionizing the Legal
Industry with Spark, NLP
and Azure Databricks
#UnifiedDataAnalytics #SparkAISummit
3
Mirko’s main role is to build and lead the data science lab.
He strongly believes that the work that Clifford Chance is doing in the
research department offers a unique opportunity to shape and change the
legal sector which is why he also work closely with universities for research.
About Mirko Bernardoni
MIRKO BERNARDONI
Head of Data Science
Clifford Chance
https://www.linkedin.com/i/mirko-bernardoni
4
Michael’s focus is on solving Clifford Chance’s data engineering challenges.
As a core member of the team, he has been involved in the design and development of the
data pipelines crucial to the lab’s success, as well as delivering numerous machine learning
projects helping to reshape Clifford Chance’s operational insights and execution.
About Michael Seddon
MICHAEL SEDDON
Senior Machine Learning Engineer
Clifford Chance
https://www.linkedin.com/in/mrmichaelseddon/
We are one of the world’s
pre-eminent law firms with
significant depth and range of
resources across five continents
Clifford Chance
As a global law firm we are able to support
clients at both a local and international level
across Europe, Asia Pacific, the Americas,
the Middle East and Africa.
Our global, cross-discipline teams advise on a
full range of legal solutions. We have a global
view, and through our sector approach, a
detailed understanding of our clients’ business,
its drivers and competitive landscapes.
Our Responsible Business strategy is integral
to our firm strategy. It guides how we conduct
our core business, how we develop and
support our people, and how we foster closer
collaboration with our clients.
• Doing business – we promote market-
shaping practices in relation to ethics,
professional standards and
risk management
• People – we realise the potential of our
people by creating a safe, healthy and
inclusive workplace, and broadening our
skills and experience
• Community – we partner to support our
community by widening access to
justice, finance and education
• Environment – we manage our
environmental footprint and contribute to
developing a more sustainable world.
At Clifford Chance, we are committed to
delivering a world-class service – providing
the highest quality advice and support
efficiently and effectively, every time.
Our clients, who include corporate
companies across all commercial and
industrial sectors, governments, regulators,
trade bodies and not-for-profit organisations
are at the heart of how we work.
Understanding what our clients value and
aligning with their needs underpins our
approach. We invest heavily to ensure that
clients benefit from our formidable knowledge
and market insights, that they have access to
the best team for the job, and that we bring
the right processes and advanced
technologies to bear on each matter.
Trusted legal advice
for the world’s
Leading businesses of
today and tomorrow
As a single, fully integrated, global partnership, we pride ourselves on our
approachable, collegiate and team-based way of working.
We are a single profit pool, lockstep
partnership. Our ambition is to work
collaboratively across geographies, practices,
product areas and sectors to deliver the best
advice and support to our clients.
Our structure
Our international network Responsible BusinessDelivering value to our clients
Agenda
Data Science in Law
Architecture
First success: Graph Analysis
NLP deep learning in Spark & Databricks
6
Data science in Law
Why would a law firm get involved in creating a data science lab?
7
Digital is changing how business gets done
Current
Law firm +
Digital Technologies
=
Future
Law Firm
• Agile platforms and solutions designed change and adapt
• Human augmentation (AI)
• Draw better insight out of data convert intelligent into action
• Simplify & digital work execution
Digital Trends
• Re-envision existing and enable new business
models
• Embrace different way of bringing people together
• Develop new capabilities that help organizations
transform themselves into digital organizations
New regulations, markets
• Staying ahead by anticipating what’s next
• Industry competitiveness (panels, benchmarking)
What we do in the Data Science Lab
360
information
products
We seek to
better understand our
data, diagnose, predict
business outcomes
and recommend what
to do to achieve
our goals.
Operational
insights,
predictions &
prescriptions
Turn our
data into new
Client
Products &
Services
Provide self-service 360 views of critical data, available to all;
to research further particular topics and enhance the existing
business intelligence reporting.
Provide cross-cutting operational insights, Anticipate what will
happen and recommend what to do to achieve goals.
For example:
• what’s the patterns to our profitability and what actions can we
take?
• can we better predict our fees?
Seek to create new products and services based on our data &
insights we can derive from it.
• Up to your imagination!
10
ELIVER
AI Adoption MATURITY CURVE
Address Cloud
challenges
Innovation pipeline
What are you
trying to do?
AI capability
AImaturity
From research projects to Client Solutions
BI & Apps
• Data driven business
analytics & reporting
Azure Data Services, SQL Server
Power Platform (Power BI, Power Apps,
Power Flow)
SaaS with AI
• Immediate actionable insights
with AI
Already productised: for example for due diligence, risk
management, contract automation,
Also with Office 365, Workplace Analytics)
Ad hoc AI
• AI Research
• Big data insight
• AI idea validations
Azure cloud services
Infrastructure as code
New opportunities
Speed to market
AI “Accelerators”
• Solution specific AI services
& patterns
Azure Cognitive Services
Azure Bot Services
Azure Search
Industry accelerators
AI Products
• Operationalise AI
• Product realisation
• Data Science & Deep AI
capabilities
Open Frameworks
Azure Databricks
Azure Machine Learning
Azure AI Infrastructure
Legal specific
DS-Lab end to end capability
11
Technology pioneers
Evangelization
DevOps / SecOps / Operationalization
Idea & business process management
Data Science
• Research
• Academic collaborations
• Modelling (e.g. ML and DL)
Data pipeline
• Data extraction / collection
• Transformation
• Compliance
• Confidentiality
Production
• Minimum Viable Product
• AI productionisation
• Product development
Architecture
The Legal Industry deals with confidential matters…
is the data science lab system architecture appropriate?
12
Confidentiality
13
User Access
Time validity
Contract limitations
Client Contracts
Audits!
Line of BusinessData Science
Dataset 1
Dataset 3
Dataset 2
Batch
Stream
ML+AI Community and
topics detection
ML+AI Relationship analysis
ML+AI Decision Automation
Data Engineering
Data Processing
Customers
Volume
Variety
Velocity
Many challenges in legal
Blob storage
Data Lake
On-premise
Architecture
15
First success: Graph analysis
What is the data science lab spark?
16
17
Understanding client relationships: a work in progress
Understanding customer
relationships is crucial to our
business and revenue
generation
We have a constant
battle getting people to
use our CRM
(customer relationship
management); it is
manually intensive, and
people's time is
pressured
Use our existing data to
uncover insights into our
customer and client
relationships
Client relationship analysis
18
Client relationship analysis
Key
performance
indicator
Define
datasets Academic
literature
& research
Demo
realization
Production
• Email system
• HR system
• CRM system
• Others..
• Knowledge
graph
• Graph
algorithms
• Dataset analysis
• Research
implementation
• User visualisation
• Minimum
viable product
• Productionise
the research
• Agile Project
• Confidential
19
Client relationship analysis
NLP deep learning in Spark &
Databricks
Let’s get serious
20
21
Document classification
LIBOR, Brexit and similar
exercise deals with huge
number of documents.
Document analysis requires
the identification of the
document type
The ability of
recognizing the
document types is a
requirement for many
matters
We often have urgent use
cases from client tenders
22
Document classification
Key
performance
indicator
Define
datasets Academic
literature
& research
Demo
realization
Production
• EDGAR • Text
classification
• Document
classification
• Deep learning
for NLP
• Work in progress • Minimum
viable product
• Productionise
the research
• Agile Project
• Confidential
Document example
• Document size 300 pages or more (more than 150.000 words)
23
Open EDGAR
Open EDGAR
24
Data download and cleaning
U.S.A
SEC Open EDGAR
Document Analysis
Document Cleaning
Blob
Storage
Text Extraction
15M raw documents
7M cleaned documents
250K state of the art
Metadata
Models development
• Cross validation with 10 datasets of 1000
documents
• Considered the following models/architectures:
– CNN
– LSTM
– Doc2Vec
– SVM
– BERT
25
Hyperparameter optimisation
• How can we choose the optimal
hyperparameters?
– Grid search
– Bayesian optimization
26
Grid search on single core model
27
Hyperparameters
RDD
.map()
Single
core
model
Single
core
model
Single
core
model
Single
core
model
Executor
Single
core
model Dataframe
28
Mlflow – single core
29
How does mlflow manage multithreading
in Spark?
30
• Created a class instead of using global variables
• Implemented __enter__, __exit__, __del__
methods for using the same notation
How does mlflow manage multithreading in
Spark?
class MtMlFlow:
def __init__(self, run_id=None,
experiment_name=None, experiment_id=None, mlflow_run:
Run = None):
@staticmethod
def get_experiment_id(experiment_name):
def __enter__(self):
def __exit__(self, exc_type, exc_val, exc_tb):
def __del__(self):
@property
def run(self):
@property
def run_id(self):
def log_param(self, key, value):
def set_tag(self, key, value):
def log_metric(self, key, value, step=None):
def log_metrics(self, metrics, step=None):
def log_params(self, params):
def set_tags(self, tags):
def log_artifact(self, local_path, …):
def log_artifacts(self, local_dir, …):
@classmethod
def create_experiment(cls, name, …):
def get_artifact_uri(self, artifact_path=None):
31
with MtMlFlow(experiment_name='/Shared/my_exp_results') as experiment:
experiment.log_param("batch", hyperparameters[0])
experiment.log_param("vector_size", hyperparameters[1])
experiment.log_param("window_size", hyperparameters[2])
experiment.log_param("cost", hyperparameters[3])
experiment.log_param("gamma", hyperparameters[4])
...
experiment.log_artifact(file_path)
Bayesian optimization for hyperparameter optimisation
32
Executor
Deep Learning Model
Executor
Deep Learning Model
Notebooks
Deep Learning Model
Hyperopt
Hyperparameters definition
Optimal
hyperparameter
combination
> 500 runsSparkTrials
Hyperopt example
33
Hyperparameters results
34
Hyperparameter optimisation
35
Number of
documents
Number of
executors
Number of CPU
per Executor
Time per
run
Elapsed
Time
Total
CPU time
Number
of runs
Grid search 10000 10 32 75m 8h 100 days 1920
Bayesian
optimisation (TPE*) 10000 10 2 GPU 25m 21h 9 days 500
TPE = Tree of Parzen Estimators
What results?
• The best model achieve F1 score of 98.5 %
36
Q&A
Thank you!
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
40
Run_parallel implementation
41

More Related Content

What's hot

How to Handle NoSQL with a Relational Database
How to Handle NoSQL with a Relational DatabaseHow to Handle NoSQL with a Relational Database
How to Handle NoSQL with a Relational Database
DATAVERSITY
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Spark Summit
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...
HostedbyConfluent
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Univa, an Altair Company
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
Databricks
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
Databricks
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
confluent
 
디자인 작업을 위한 일문 서체 사용 가이드
디자인 작업을 위한 일문 서체 사용 가이드디자인 작업을 위한 일문 서체 사용 가이드
디자인 작업을 위한 일문 서체 사용 가이드
Yuntak Jeong
 
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
Neo4j
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Arnon Shimoni
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
Data Works MD
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
DataWorks Summit
 

What's hot (20)

How to Handle NoSQL with a Relational Database
How to Handle NoSQL with a Relational DatabaseHow to Handle NoSQL with a Relational Database
How to Handle NoSQL with a Relational Database
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
디자인 작업을 위한 일문 서체 사용 가이드
디자인 작업을 위한 일문 서체 사용 가이드디자인 작업을 위한 일문 서체 사용 가이드
디자인 작업을 위한 일문 서체 사용 가이드
 
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
 

Similar to Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Clifford Chance

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks
 
Entry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsEntry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data Analytics
Inside Analysis
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
Inside Analysis
 
Acctiva: expertise in Business Intelligence, Data Warehousing, Data Governance
Acctiva: expertise in Business Intelligence, Data Warehousing, Data GovernanceAcctiva: expertise in Business Intelligence, Data Warehousing, Data Governance
Acctiva: expertise in Business Intelligence, Data Warehousing, Data Governance
Acctiva Ltd.
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Inside Analysis
 
Future ready
Future readyFuture ready
Future ready
Ben Turner
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data LifecycleEnterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
NIXUnited
 
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data LifecycleEnterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
ErinDempsey17
 
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Innovation Enterprise
 
CI or FS Poly Cleared Job Fair Handbook | May 18
CI or FS Poly Cleared Job Fair Handbook | May 18 CI or FS Poly Cleared Job Fair Handbook | May 18
CI or FS Poly Cleared Job Fair Handbook | May 18
ClearedJobs.Net
 
Web focus overview presentation 2015
Web focus overview presentation 2015Web focus overview presentation 2015
Web focus overview presentation 2015
Alexa Ramirez Espinosa
 
Graphs in Life Sciences
Graphs in Life SciencesGraphs in Life Sciences
Graphs in Life Sciences
Neo4j
 
reStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer Directory
reStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer DirectoryreStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer Directory
reStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer Directory
Ken Fuller
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
IDC Executive Programs Brochure
IDC Executive Programs Brochure IDC Executive Programs Brochure
IDC Executive Programs Brochure IDC Research Inc.
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera, Inc.
 
iiiQbets company profile
iiiQbets company profileiiiQbets company profile
iiiQbets company profile
BhagyaV2
 
ZIGRAM Introduction September 2020
ZIGRAM Introduction September 2020ZIGRAM Introduction September 2020
ZIGRAM Introduction September 2020
ZIGRAM
 
Cubodrom profile
Cubodrom profileCubodrom profile
Cubodrom profile
cubodrom
 

Similar to Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Clifford Chance (20)

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Entry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsEntry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data Analytics
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
Acctiva: expertise in Business Intelligence, Data Warehousing, Data Governance
Acctiva: expertise in Business Intelligence, Data Warehousing, Data GovernanceAcctiva: expertise in Business Intelligence, Data Warehousing, Data Governance
Acctiva: expertise in Business Intelligence, Data Warehousing, Data Governance
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
 
Future ready
Future readyFuture ready
Future ready
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data LifecycleEnterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
 
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data LifecycleEnterprise Data Management: Managing your Business’s Entire Data Lifecycle
Enterprise Data Management: Managing your Business’s Entire Data Lifecycle
 
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
 
CI or FS Poly Cleared Job Fair Handbook | May 18
CI or FS Poly Cleared Job Fair Handbook | May 18 CI or FS Poly Cleared Job Fair Handbook | May 18
CI or FS Poly Cleared Job Fair Handbook | May 18
 
Web focus overview presentation 2015
Web focus overview presentation 2015Web focus overview presentation 2015
Web focus overview presentation 2015
 
Graphs in Life Sciences
Graphs in Life SciencesGraphs in Life Sciences
Graphs in Life Sciences
 
reStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer Directory
reStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer DirectoryreStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer Directory
reStartEvents April 22nd DC metro Cleared Virtual Career Fair Employer Directory
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
IDC Executive Programs Brochure
IDC Executive Programs Brochure IDC Executive Programs Brochure
IDC Executive Programs Brochure
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
iiiQbets company profile
iiiQbets company profileiiiQbets company profile
iiiQbets company profile
 
ZIGRAM Introduction September 2020
ZIGRAM Introduction September 2020ZIGRAM Introduction September 2020
ZIGRAM Introduction September 2020
 
Cubodrom profile
Cubodrom profileCubodrom profile
Cubodrom profile
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Clifford Chance

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Mirko Bernardoni & Michael Seddon Clifford Chance Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks #UnifiedDataAnalytics #SparkAISummit
  • 3. 3 Mirko’s main role is to build and lead the data science lab. He strongly believes that the work that Clifford Chance is doing in the research department offers a unique opportunity to shape and change the legal sector which is why he also work closely with universities for research. About Mirko Bernardoni MIRKO BERNARDONI Head of Data Science Clifford Chance https://www.linkedin.com/i/mirko-bernardoni
  • 4. 4 Michael’s focus is on solving Clifford Chance’s data engineering challenges. As a core member of the team, he has been involved in the design and development of the data pipelines crucial to the lab’s success, as well as delivering numerous machine learning projects helping to reshape Clifford Chance’s operational insights and execution. About Michael Seddon MICHAEL SEDDON Senior Machine Learning Engineer Clifford Chance https://www.linkedin.com/in/mrmichaelseddon/
  • 5. We are one of the world’s pre-eminent law firms with significant depth and range of resources across five continents Clifford Chance As a global law firm we are able to support clients at both a local and international level across Europe, Asia Pacific, the Americas, the Middle East and Africa. Our global, cross-discipline teams advise on a full range of legal solutions. We have a global view, and through our sector approach, a detailed understanding of our clients’ business, its drivers and competitive landscapes. Our Responsible Business strategy is integral to our firm strategy. It guides how we conduct our core business, how we develop and support our people, and how we foster closer collaboration with our clients. • Doing business – we promote market- shaping practices in relation to ethics, professional standards and risk management • People – we realise the potential of our people by creating a safe, healthy and inclusive workplace, and broadening our skills and experience • Community – we partner to support our community by widening access to justice, finance and education • Environment – we manage our environmental footprint and contribute to developing a more sustainable world. At Clifford Chance, we are committed to delivering a world-class service – providing the highest quality advice and support efficiently and effectively, every time. Our clients, who include corporate companies across all commercial and industrial sectors, governments, regulators, trade bodies and not-for-profit organisations are at the heart of how we work. Understanding what our clients value and aligning with their needs underpins our approach. We invest heavily to ensure that clients benefit from our formidable knowledge and market insights, that they have access to the best team for the job, and that we bring the right processes and advanced technologies to bear on each matter. Trusted legal advice for the world’s Leading businesses of today and tomorrow As a single, fully integrated, global partnership, we pride ourselves on our approachable, collegiate and team-based way of working. We are a single profit pool, lockstep partnership. Our ambition is to work collaboratively across geographies, practices, product areas and sectors to deliver the best advice and support to our clients. Our structure Our international network Responsible BusinessDelivering value to our clients
  • 6. Agenda Data Science in Law Architecture First success: Graph Analysis NLP deep learning in Spark & Databricks 6
  • 7. Data science in Law Why would a law firm get involved in creating a data science lab? 7
  • 8. Digital is changing how business gets done Current Law firm + Digital Technologies = Future Law Firm • Agile platforms and solutions designed change and adapt • Human augmentation (AI) • Draw better insight out of data convert intelligent into action • Simplify & digital work execution Digital Trends • Re-envision existing and enable new business models • Embrace different way of bringing people together • Develop new capabilities that help organizations transform themselves into digital organizations New regulations, markets • Staying ahead by anticipating what’s next • Industry competitiveness (panels, benchmarking)
  • 9. What we do in the Data Science Lab 360 information products We seek to better understand our data, diagnose, predict business outcomes and recommend what to do to achieve our goals. Operational insights, predictions & prescriptions Turn our data into new Client Products & Services Provide self-service 360 views of critical data, available to all; to research further particular topics and enhance the existing business intelligence reporting. Provide cross-cutting operational insights, Anticipate what will happen and recommend what to do to achieve goals. For example: • what’s the patterns to our profitability and what actions can we take? • can we better predict our fees? Seek to create new products and services based on our data & insights we can derive from it. • Up to your imagination!
  • 10. 10 ELIVER AI Adoption MATURITY CURVE Address Cloud challenges Innovation pipeline What are you trying to do? AI capability AImaturity From research projects to Client Solutions BI & Apps • Data driven business analytics & reporting Azure Data Services, SQL Server Power Platform (Power BI, Power Apps, Power Flow) SaaS with AI • Immediate actionable insights with AI Already productised: for example for due diligence, risk management, contract automation, Also with Office 365, Workplace Analytics) Ad hoc AI • AI Research • Big data insight • AI idea validations Azure cloud services Infrastructure as code New opportunities Speed to market AI “Accelerators” • Solution specific AI services & patterns Azure Cognitive Services Azure Bot Services Azure Search Industry accelerators AI Products • Operationalise AI • Product realisation • Data Science & Deep AI capabilities Open Frameworks Azure Databricks Azure Machine Learning Azure AI Infrastructure Legal specific
  • 11. DS-Lab end to end capability 11 Technology pioneers Evangelization DevOps / SecOps / Operationalization Idea & business process management Data Science • Research • Academic collaborations • Modelling (e.g. ML and DL) Data pipeline • Data extraction / collection • Transformation • Compliance • Confidentiality Production • Minimum Viable Product • AI productionisation • Product development
  • 12. Architecture The Legal Industry deals with confidential matters… is the data science lab system architecture appropriate? 12
  • 13. Confidentiality 13 User Access Time validity Contract limitations Client Contracts Audits!
  • 14. Line of BusinessData Science Dataset 1 Dataset 3 Dataset 2 Batch Stream ML+AI Community and topics detection ML+AI Relationship analysis ML+AI Decision Automation Data Engineering Data Processing Customers Volume Variety Velocity Many challenges in legal Blob storage Data Lake On-premise
  • 16. First success: Graph analysis What is the data science lab spark? 16
  • 17. 17 Understanding client relationships: a work in progress Understanding customer relationships is crucial to our business and revenue generation We have a constant battle getting people to use our CRM (customer relationship management); it is manually intensive, and people's time is pressured Use our existing data to uncover insights into our customer and client relationships Client relationship analysis
  • 18. 18 Client relationship analysis Key performance indicator Define datasets Academic literature & research Demo realization Production • Email system • HR system • CRM system • Others.. • Knowledge graph • Graph algorithms • Dataset analysis • Research implementation • User visualisation • Minimum viable product • Productionise the research • Agile Project • Confidential
  • 20. NLP deep learning in Spark & Databricks Let’s get serious 20
  • 21. 21 Document classification LIBOR, Brexit and similar exercise deals with huge number of documents. Document analysis requires the identification of the document type The ability of recognizing the document types is a requirement for many matters We often have urgent use cases from client tenders
  • 22. 22 Document classification Key performance indicator Define datasets Academic literature & research Demo realization Production • EDGAR • Text classification • Document classification • Deep learning for NLP • Work in progress • Minimum viable product • Productionise the research • Agile Project • Confidential
  • 23. Document example • Document size 300 pages or more (more than 150.000 words) 23
  • 24. Open EDGAR Open EDGAR 24 Data download and cleaning U.S.A SEC Open EDGAR Document Analysis Document Cleaning Blob Storage Text Extraction 15M raw documents 7M cleaned documents 250K state of the art Metadata
  • 25. Models development • Cross validation with 10 datasets of 1000 documents • Considered the following models/architectures: – CNN – LSTM – Doc2Vec – SVM – BERT 25
  • 26. Hyperparameter optimisation • How can we choose the optimal hyperparameters? – Grid search – Bayesian optimization 26
  • 27. Grid search on single core model 27 Hyperparameters RDD .map() Single core model Single core model Single core model Single core model Executor Single core model Dataframe
  • 29. 29
  • 30. How does mlflow manage multithreading in Spark? 30 • Created a class instead of using global variables • Implemented __enter__, __exit__, __del__ methods for using the same notation
  • 31. How does mlflow manage multithreading in Spark? class MtMlFlow: def __init__(self, run_id=None, experiment_name=None, experiment_id=None, mlflow_run: Run = None): @staticmethod def get_experiment_id(experiment_name): def __enter__(self): def __exit__(self, exc_type, exc_val, exc_tb): def __del__(self): @property def run(self): @property def run_id(self): def log_param(self, key, value): def set_tag(self, key, value): def log_metric(self, key, value, step=None): def log_metrics(self, metrics, step=None): def log_params(self, params): def set_tags(self, tags): def log_artifact(self, local_path, …): def log_artifacts(self, local_dir, …): @classmethod def create_experiment(cls, name, …): def get_artifact_uri(self, artifact_path=None): 31 with MtMlFlow(experiment_name='/Shared/my_exp_results') as experiment: experiment.log_param("batch", hyperparameters[0]) experiment.log_param("vector_size", hyperparameters[1]) experiment.log_param("window_size", hyperparameters[2]) experiment.log_param("cost", hyperparameters[3]) experiment.log_param("gamma", hyperparameters[4]) ... experiment.log_artifact(file_path)
  • 32. Bayesian optimization for hyperparameter optimisation 32 Executor Deep Learning Model Executor Deep Learning Model Notebooks Deep Learning Model Hyperopt Hyperparameters definition Optimal hyperparameter combination > 500 runsSparkTrials
  • 35. Hyperparameter optimisation 35 Number of documents Number of executors Number of CPU per Executor Time per run Elapsed Time Total CPU time Number of runs Grid search 10000 10 32 75m 8h 100 days 1920 Bayesian optimisation (TPE*) 10000 10 2 GPU 25m 21h 9 days 500 TPE = Tree of Parzen Estimators
  • 36. What results? • The best model achieve F1 score of 98.5 % 36
  • 37. Q&A
  • 39. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT
  • 40. 40