SlideShare a Scribd company logo
1 of 3
Key Achivements
During my work in Justis Publishing
19/09/2011 – 29/08/2013
Justis Publishing is an independent publisher of electronic legal information, which owns one of the
biggest online collection of legal cases UK, and international law as well as other legal documents
including UK and international law legislation, acts, parliamentary resources publications.
On daily basis the company receives hundreds of documents from its data providers, legal information
publishers like England and Wales Civil Appeal Judgments, Canada Law Reports, Bermuda Law Reports,
Jersey Law Report as well as various legal publications from Cambridge University Press, Oxford
University Press, LexisNexis and others.
Justis Publishing faced the problem in processing incoming data from its data providers. The projects
posed several challenges. The biggest one was that all data provider use different types of source data
files, which are exported or extracted or copied, from disparate systems in many different formats,
preventing easy and quick integration and information sharing. Many of the incoming file types have
different designs, coding standards, business rules, structured and unstructured, with a big margin for
error. The vast majority of the data was not properly structured. Documents from one data source
often contains many types of data irregularities and inconsistencies like
 unclosed tags in XML files make them inaccessible to further automatic processing
 missing carriage return or new line sign in text files (typical for text files exported in Unix system)
 leading blank lines
 unrecognized symbols
 non compliant encoding difficult to parse
Justis Publishing had been struggled with this problem since the beginning of the business and has
never solved it completely. As Justis Publishing is heavily dependent on its data providers, this problem
imposed the extremely big impact on the business.
As a SSIS developer, I was responsible for ETL processes and tools as well as database loading
and manipulation to ensure that the design complies with requirements, established
methodologies, and best practices. In doing so I use various data audit and validation
methods and procedures to ensure quality and effectiveness of legacy data conversions to
Data Warehouse. Then I design the solution by conceptualizing data flows and
transformations, translating those into sequence of steps, performing source to target data
mapping, developing code, debugging, and testing. I utilize the benefits of various ETL tools
and scripting languages including Ruby, C#, VBA, Java and Javascript and develop shell scripts,
and stored procedures to support new solution.
The solution I developed extracts data from the files, performs various cleansing, validation,
transformation, conversion manipulations, stores data in a relational database and then
establishes interlinks between the existing documents using advanced data processing based
on fuzzy logic algorithm.
Once the solution deployed I monitored performance of ETL processes, correct any identified
issues by perform root cause analysis on problematic queries and ETL jobs.
The projects have revolutionised the way Justis Publishing works with their data providers:
 Data from multiple sources move far more efficiently and in a scalable way
 All data are process minutes after they arrive not months it used to
 Interlinks between new and existing legal cases improved significantly.

More Related Content

What's hot

International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)ijfcst journal
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)ijfcst journal
 
EU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeEU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeMarc Garriga
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)ijfcst journal
 
International Journal of Database Management Systems (IJDMS)
International Journal of Database Management Systems (IJDMS)International Journal of Database Management Systems (IJDMS)
International Journal of Database Management Systems (IJDMS)ijdms
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)ijfcst journal
 
Endorse cluster meeting
Endorse cluster meetingEndorse cluster meeting
Endorse cluster meetingfcleary
 

What's hot (19)

Ijdbms
IjdbmsIjdbms
Ijdbms
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)
 
EU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeEU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over Europe
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
International Journal of Database Management Systems (IJDMS)
International Journal of Database Management Systems (IJDMS)International Journal of Database Management Systems (IJDMS)
International Journal of Database Management Systems (IJDMS)
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 
International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)International Journal of Database Management Systems (IJDBMS)
International Journal of Database Management Systems (IJDBMS)
 
Endorse cluster meeting
Endorse cluster meetingEndorse cluster meeting
Endorse cluster meeting
 
Ijdbms
IjdbmsIjdbms
Ijdbms
 

Viewers also liked

The Exergame Network - Report on our Progress
The Exergame Network - Report on our ProgressThe Exergame Network - Report on our Progress
The Exergame Network - Report on our ProgressExergame Network
 
Pfizer-Strategic Management Case
Pfizer-Strategic Management CasePfizer-Strategic Management Case
Pfizer-Strategic Management CaseSanjaya Sanjaya
 
21 Essential Strategies for Growing Your Business
21 Essential Strategies for Growing Your Business21 Essential Strategies for Growing Your Business
21 Essential Strategies for Growing Your BusinessBrian Downard
 
Integrating Social Media in your business model
Integrating Social Media in your business modelIntegrating Social Media in your business model
Integrating Social Media in your business modelPieter Baert
 
Business optimization | building your first million is easy
Business optimization | building your first million is easyBusiness optimization | building your first million is easy
Business optimization | building your first million is easySurjeet Singh
 
Understanding strategy in innovation and technology oriented business
Understanding strategy in innovation and technology oriented businessUnderstanding strategy in innovation and technology oriented business
Understanding strategy in innovation and technology oriented businessDurgarao Gundu
 
The 5 steps to Sales Dominance
The 5 steps to Sales DominanceThe 5 steps to Sales Dominance
The 5 steps to Sales DominanceLeadSimple
 
10 step sales process
10 step sales process10 step sales process
10 step sales processeconnexx
 
Fintech and Transformation of the Financial Services Industry
Fintech and Transformation of the Financial Services IndustryFintech and Transformation of the Financial Services Industry
Fintech and Transformation of the Financial Services IndustryRobin Teigland
 
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B CompaniesThe State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B CompaniesMattermark
 

Viewers also liked (11)

The Exergame Network - Report on our Progress
The Exergame Network - Report on our ProgressThe Exergame Network - Report on our Progress
The Exergame Network - Report on our Progress
 
How to Grow Your Sales, Conversions and Customers
How to Grow Your Sales, Conversions and CustomersHow to Grow Your Sales, Conversions and Customers
How to Grow Your Sales, Conversions and Customers
 
Pfizer-Strategic Management Case
Pfizer-Strategic Management CasePfizer-Strategic Management Case
Pfizer-Strategic Management Case
 
21 Essential Strategies for Growing Your Business
21 Essential Strategies for Growing Your Business21 Essential Strategies for Growing Your Business
21 Essential Strategies for Growing Your Business
 
Integrating Social Media in your business model
Integrating Social Media in your business modelIntegrating Social Media in your business model
Integrating Social Media in your business model
 
Business optimization | building your first million is easy
Business optimization | building your first million is easyBusiness optimization | building your first million is easy
Business optimization | building your first million is easy
 
Understanding strategy in innovation and technology oriented business
Understanding strategy in innovation and technology oriented businessUnderstanding strategy in innovation and technology oriented business
Understanding strategy in innovation and technology oriented business
 
The 5 steps to Sales Dominance
The 5 steps to Sales DominanceThe 5 steps to Sales Dominance
The 5 steps to Sales Dominance
 
10 step sales process
10 step sales process10 step sales process
10 step sales process
 
Fintech and Transformation of the Financial Services Industry
Fintech and Transformation of the Financial Services IndustryFintech and Transformation of the Financial Services Industry
Fintech and Transformation of the Financial Services Industry
 
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B CompaniesThe State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
 

Similar to KeyAchivementsJustisPublishing

The Sherpa Approach: Meeting the Demands of the Digital Age
The Sherpa Approach:  Meeting the Demands of the Digital AgeThe Sherpa Approach:  Meeting the Demands of the Digital Age
The Sherpa Approach: Meeting the Demands of the Digital AgeSherpa Software
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems
 
2 7-2013-big data and e-discovery
2 7-2013-big data and e-discovery2 7-2013-big data and e-discovery
2 7-2013-big data and e-discoveryExterro
 
ey-forensics-discovery-services.pdf
ey-forensics-discovery-services.pdfey-forensics-discovery-services.pdf
ey-forensics-discovery-services.pdfaparnamore11
 
Capture Discovery
Capture DiscoveryCapture Discovery
Capture Discoverywlucina
 
Understanding the Value of Database Discovery - Beyond Unstructured Data
Understanding the Value of Database Discovery - Beyond Unstructured DataUnderstanding the Value of Database Discovery - Beyond Unstructured Data
Understanding the Value of Database Discovery - Beyond Unstructured DataLogikcull.com
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdfAkuhuruf
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 
ACEDS-Stroock 9-4-14 Webcast Presentation
ACEDS-Stroock 9-4-14 Webcast Presentation ACEDS-Stroock 9-4-14 Webcast Presentation
ACEDS-Stroock 9-4-14 Webcast Presentation Robbie Hilson
 
markfinleyResumeMarch2016
markfinleyResumeMarch2016markfinleyResumeMarch2016
markfinleyResumeMarch2016Mark Finley
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfRanjeet Bhalshankar
 
Navigating the Complexities of eDiscovery Services with LDM Global in the USA...
Navigating the Complexities of eDiscovery Services with LDM Global in the USA...Navigating the Complexities of eDiscovery Services with LDM Global in the USA...
Navigating the Complexities of eDiscovery Services with LDM Global in the USA...LDM Global
 
Documaster – The true value of documents
Documaster – The true value of documentsDocumaster – The true value of documents
Documaster – The true value of documentsNXC Switzerland
 

Similar to KeyAchivementsJustisPublishing (20)

The Sherpa Approach: Meeting the Demands of the Digital Age
The Sherpa Approach:  Meeting the Demands of the Digital AgeThe Sherpa Approach:  Meeting the Demands of the Digital Age
The Sherpa Approach: Meeting the Demands of the Digital Age
 
13695550
1369555013695550
13695550
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & Analytics
 
Electronic Data Discovery
Electronic Data DiscoveryElectronic Data Discovery
Electronic Data Discovery
 
2 7-2013-big data and e-discovery
2 7-2013-big data and e-discovery2 7-2013-big data and e-discovery
2 7-2013-big data and e-discovery
 
ey-forensics-discovery-services.pdf
ey-forensics-discovery-services.pdfey-forensics-discovery-services.pdf
ey-forensics-discovery-services.pdf
 
Capture Discovery
Capture DiscoveryCapture Discovery
Capture Discovery
 
Understanding the Value of Database Discovery - Beyond Unstructured Data
Understanding the Value of Database Discovery - Beyond Unstructured DataUnderstanding the Value of Database Discovery - Beyond Unstructured Data
Understanding the Value of Database Discovery - Beyond Unstructured Data
 
Ruvos - Capabilities - 2022
Ruvos - Capabilities - 2022  Ruvos - Capabilities - 2022
Ruvos - Capabilities - 2022
 
elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
Abdul ETL Resume
Abdul ETL ResumeAbdul ETL Resume
Abdul ETL Resume
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
ACEDS-Stroock 9-4-14 Webcast Presentation
ACEDS-Stroock 9-4-14 Webcast Presentation ACEDS-Stroock 9-4-14 Webcast Presentation
ACEDS-Stroock 9-4-14 Webcast Presentation
 
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
 
markfinleyResumeMarch2016
markfinleyResumeMarch2016markfinleyResumeMarch2016
markfinleyResumeMarch2016
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdf
 
Navigating the Complexities of eDiscovery Services with LDM Global in the USA...
Navigating the Complexities of eDiscovery Services with LDM Global in the USA...Navigating the Complexities of eDiscovery Services with LDM Global in the USA...
Navigating the Complexities of eDiscovery Services with LDM Global in the USA...
 
ILTA 2014: LexisNexis Software Company Update
ILTA 2014:  LexisNexis Software Company UpdateILTA 2014:  LexisNexis Software Company Update
ILTA 2014: LexisNexis Software Company Update
 
Documaster – The true value of documents
Documaster – The true value of documentsDocumaster – The true value of documents
Documaster – The true value of documents
 
Unit 1
Unit 1Unit 1
Unit 1
 

More from Vera Ekimenko

Data Quality with AI
Data Quality with AIData Quality with AI
Data Quality with AIVera Ekimenko
 
Deep Reinforcement Learning for Portfolio Optimization
Deep Reinforcement Learning for Portfolio OptimizationDeep Reinforcement Learning for Portfolio Optimization
Deep Reinforcement Learning for Portfolio OptimizationVera Ekimenko
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data QualityVera Ekimenko
 
Unsupervised AI for Data Quality
Unsupervised AI for Data QualityUnsupervised AI for Data Quality
Unsupervised AI for Data QualityVera Ekimenko
 
Deep Learning Hackathon
Deep Learning HackathonDeep Learning Hackathon
Deep Learning HackathonVera Ekimenko
 
Cloudera migration oozie_hadoop_ci_cd_pipeline
Cloudera migration oozie_hadoop_ci_cd_pipelineCloudera migration oozie_hadoop_ci_cd_pipeline
Cloudera migration oozie_hadoop_ci_cd_pipelineVera Ekimenko
 
Artificial Intelligence Hackathon
Artificial Intelligence HackathonArtificial Intelligence Hackathon
Artificial Intelligence HackathonVera Ekimenko
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecastVera Ekimenko
 
HCM Access Insight Dashboard
HCM Access Insight DashboardHCM Access Insight Dashboard
HCM Access Insight DashboardVera Ekimenko
 

More from Vera Ekimenko (13)

Data Quality with AI
Data Quality with AIData Quality with AI
Data Quality with AI
 
AML Knowledge Graph
AML Knowledge GraphAML Knowledge Graph
AML Knowledge Graph
 
Deep Reinforcement Learning for Portfolio Optimization
Deep Reinforcement Learning for Portfolio OptimizationDeep Reinforcement Learning for Portfolio Optimization
Deep Reinforcement Learning for Portfolio Optimization
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data Quality
 
Unsupervised AI for Data Quality
Unsupervised AI for Data QualityUnsupervised AI for Data Quality
Unsupervised AI for Data Quality
 
Deep Learning Hackathon
Deep Learning HackathonDeep Learning Hackathon
Deep Learning Hackathon
 
Cloudera migration oozie_hadoop_ci_cd_pipeline
Cloudera migration oozie_hadoop_ci_cd_pipelineCloudera migration oozie_hadoop_ci_cd_pipeline
Cloudera migration oozie_hadoop_ci_cd_pipeline
 
Artificial Intelligence Hackathon
Artificial Intelligence HackathonArtificial Intelligence Hackathon
Artificial Intelligence Hackathon
 
CSharp
CSharpCSharp
CSharp
 
DWHRestructure
DWHRestructureDWHRestructure
DWHRestructure
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecast
 
buy_in
buy_inbuy_in
buy_in
 
HCM Access Insight Dashboard
HCM Access Insight DashboardHCM Access Insight Dashboard
HCM Access Insight Dashboard
 

KeyAchivementsJustisPublishing

  • 1. Key Achivements During my work in Justis Publishing 19/09/2011 – 29/08/2013
  • 2. Justis Publishing is an independent publisher of electronic legal information, which owns one of the biggest online collection of legal cases UK, and international law as well as other legal documents including UK and international law legislation, acts, parliamentary resources publications. On daily basis the company receives hundreds of documents from its data providers, legal information publishers like England and Wales Civil Appeal Judgments, Canada Law Reports, Bermuda Law Reports, Jersey Law Report as well as various legal publications from Cambridge University Press, Oxford University Press, LexisNexis and others. Justis Publishing faced the problem in processing incoming data from its data providers. The projects posed several challenges. The biggest one was that all data provider use different types of source data files, which are exported or extracted or copied, from disparate systems in many different formats, preventing easy and quick integration and information sharing. Many of the incoming file types have different designs, coding standards, business rules, structured and unstructured, with a big margin for error. The vast majority of the data was not properly structured. Documents from one data source often contains many types of data irregularities and inconsistencies like  unclosed tags in XML files make them inaccessible to further automatic processing  missing carriage return or new line sign in text files (typical for text files exported in Unix system)  leading blank lines  unrecognized symbols  non compliant encoding difficult to parse Justis Publishing had been struggled with this problem since the beginning of the business and has never solved it completely. As Justis Publishing is heavily dependent on its data providers, this problem imposed the extremely big impact on the business.
  • 3. As a SSIS developer, I was responsible for ETL processes and tools as well as database loading and manipulation to ensure that the design complies with requirements, established methodologies, and best practices. In doing so I use various data audit and validation methods and procedures to ensure quality and effectiveness of legacy data conversions to Data Warehouse. Then I design the solution by conceptualizing data flows and transformations, translating those into sequence of steps, performing source to target data mapping, developing code, debugging, and testing. I utilize the benefits of various ETL tools and scripting languages including Ruby, C#, VBA, Java and Javascript and develop shell scripts, and stored procedures to support new solution. The solution I developed extracts data from the files, performs various cleansing, validation, transformation, conversion manipulations, stores data in a relational database and then establishes interlinks between the existing documents using advanced data processing based on fuzzy logic algorithm. Once the solution deployed I monitored performance of ETL processes, correct any identified issues by perform root cause analysis on problematic queries and ETL jobs. The projects have revolutionised the way Justis Publishing works with their data providers:  Data from multiple sources move far more efficiently and in a scalable way  All data are process minutes after they arrive not months it used to  Interlinks between new and existing legal cases improved significantly.