đź“Ł Welcome to Day 6 of the UiPath Studio Web Workshop. In this session, we will dive into the essentials of PDF activities in UiPath Studio Web. Join us as we provide an overview of PDF automation, covering various activities related to data extraction and image extraction. Get ready for a hands-on experience with a live demonstration.
👉 Topics covered
đź“Ś Task 1: PDF Automation Overview
Understanding PDF Automation in UiPath Studio Web
Consolidating PDF Activities
Live Demonstration
Speakers:
Vajrang Billlakurthi, Digital Transformation Leader, Vajrang IT Services Pvt Ltd. and UiPath MVP
Swathi Nelakurthi, Associate Automation Developer, Vajrang IT Services Pvt Ltd
Rahul Goyal, SR. Director, ERP Systems, Ellucian and UiPath MVP
👉 Visit the series page to register to all events.
3. 3
• Overview:
- The "PDF Automation Overview" project provides a detailed exploration of PDF automation functionalities in UiPath
Studio Web.
- It encompasses a variety of PDF-related activities, including downloading, text extraction, merging, page range
extraction, image extraction, password protection, and file uploading to Orchestrator Storage..
• Variables Used:
- SamplePDF (Type: File): Variable for the downloaded Sample PDF file.
- ScannedPDF (Type: File): Variable for the downloaded Scanned PDF file.
- SamplePDFText (Type: Text): Variable to store text extracted from SamplePDF.
- ScannedPDFText (Type: Text): Variable to store text extracted from ScannedPDF.
- PDFPageCount (Type: Int32): Variable to store the page count of a PDF.
- MergedPDF (Type: File): Variable to store the merged PDF file.
- ExtractedPDFPageRange (Type: File): Variable to store pages extracted from a PDF based on a specified range.
- ExtractedPDFImages (Type: IEnumerable<ILocalResource): Variable to store the list of extracted images from a PDF.
- PasswordProtectedPDF(Type: File): Variable to store the password-protected PDF file.
1. PDF Automation Overview
4. 4
• Create storage buckets to manage PDF files
- PDF Inputs: Store input PDF files.
- PDF Outputs: Store output files generated during PDF automation.
• Upload files to respective buckets:
- Upload Sample.pdf to the PDF Inputs bucket.
- Upload Scanned.pdf to the PDF Inputs bucket
• Activities:
- Utilize Orchestrator to create and manage storage buckets.
- Use Upload new File to upload files to the designated buckets.
Workflow Overview
5. 5
• Text Extraction from Native PDF
• Extract text from Sample PDF:
- Utilize the "Extract PDF text" activity to extract text from the provided PDF file
- Store the extracted text in a variable named "SamplePDFText".
• Write extracted text from Sample PDF to storage:
- Use the "Write Storage Text" activity to save the extracted text to a file.
- File Path*: Specify a file path with a ".txt" extension, such as "SamplePdf.txt."
• If the file already exists, the extracted text will be written to it. If not, a new text file will be created in the
specified storage bucket, and the extracted text will be written into it
• Activities:
- Extract PDF text: This activity is used to extract text from the PDF file.
- Write Storage Text: This activity writes the extracted text to a file in the specified storage bucket.
6. 6
• Text Extraction from Scanned PDF using OCR
• Extract text from Scanned PDF:
- Use the "Extract PDF text" activity to retrieve text from the scanned PDF file named "Scanned.pdf".
- Toggle on the option to apply OCR to extract text from images within the PDF.
- Store the OCR-extracted text in a variable named "ScannedPDFText".
• Write extracted text to storage:
- Use the "Write Storage Text" activity to save the OCR-extracted text to a file.
- File Path*: Specify a file path with a ".txt" extension, such as "ScannedPdf.txt."
• If the file already exists, the extracted text will be written to it. If not, a new text file will be created in the
specified storage bucket, and the OCR-extracted text will be written into it
• Activities:
- Extract PDF text: This activity retrieves text from the PDF file, including text from images using OCR.
- Apply OCR: This option is toggled on to extract text from images within the PDF.
- Write Storage Text: This activity saves the OCR-extracted text to a file in the specified storage bucket
7. 7
• PDF Page Count
• Get total number of pages in PDF:
- Utilize the "Get PDF Page Count" activity to retrieve the total page count of the PDF.
• PDF file *: Specify the PDF file from which you want to obtain the page count (e.g., SamplePDF).
• The page count will be stored in the autogenerated variable "Page Count".
• Log PDF Page Count
- Use the "Log Message" activity to log the total page count.
• Message: "Total Pages in "+SamplePDFfile.FullName+" is "+pageCount.ToString
• Activities:
- Get PDF Page Count: This activity retrieves the total number of pages in the specified PDF document and stores it
in a variable.
- Log Message: Logs the total page count for reference or debugging purposes, indicating the PDF file name and its
total page count..
8. 8
• Image Extraction from PDF
• Extract images from PDF:
- Use the "Extract PDF Images" activity to extract images from the PDF.
• Create a new variable named "ExtractedPDFImages" to store the extracted images for better understanding and
clarity.
• This activity will generate an autogenerated variable to hold the extracted images, but creating a new variable
with a proper naming convention enhances understanding.
• Upload Each Extracted Image to PDF Outputs bucket
- Utilize a "For Each" activity to iterate through each extracted image.
• The autogenerated variable "CurrentItem" holds the current image being processed.
- Inside For Each Loop
• Use the "Upload Storage File" activity to upload each image file to the "PDF Outputs" bucket.
• Activities:
- Extract PDF Images: Extracts images from the PDF document
- For Each: To iterate through each image file from the variable storing the extracted images
- Upload Storage File: Uploads each extracted image to the specified storage bucket
9. 9
• Page Range Extraction
• Extract specific page range from PDF:
- Utilize the "Extract PDF Page Range" activity to generate a new PDF with specified page ranges
• Provide the original PDF file from which the new PDF will be generated.
• Specify the page range to extract (e.g., "1,3-5" to extract pages 1, 3, 4, and 5)
• Create a new variable named "ExtractedPDFPageRange" to store the newly generated PDF for clarity.
• This activity will generate an autogenerated variable to hold the newly exported PDF, but creating a new
variable with a proper naming convention enhances understanding.
• Upload Newly Generated PDF:
- Upload the newly generated PDF to the "PDF Outputs" bucket.
• Specify the file to be uploaded as the "ExtractedPDFPageRange" variable.
• Define the path where you want to upload the file in the storage bucket as "PDFByRange.pdf".
• Activities:
- Extract PDF Page Range: This activity extracts a specified page range from the PDF and creates a new PDF
file.
- Upload Storage File: This activity uploads the generated PDF to the specified storage bucket.
10. 10
• Password Protected PDF File
• Create password protected PDF:
- Utilize Set PDF Password activity to encrypt Sample.pdf with a password.
• Provide the original PDF file from which the new Password Protected PDF will be generated
• In Show Additional Options in New open password, specify the password (Ex: 123456)
• Create a new variable named "PasswordProtectedPDF" to store the newly generated PDF for clarity.
• This activity will generate an autogenerated variable to hold the newly generated PDF, but creating a
new variable with a proper naming convention enhances understanding.
• Upload the password protected PDF
- Use Upload Storage File activity to upload the password-protected PDF to the PDF Outputs bucket.
• Specify the file to be uploaded as the "PasswordProtectedPDF" variable.
• Define the path where you want to upload the file in the storage bucket as "PasswordProtectedPDF.pdf"
• Activities:
- Set PDF Password: This activity encrypts the specified PDF file with a password.
- Upload Storage File: This activity uploads the generated PDF to the specified storage bucket.
11. 11
• Merge PDF Files
• Merge Mutiple PDF Files
- Utilize the "Merge PDF" activity to merge the "Sample PDF" and "Scanned PDF" into a single PDF.
- In the connection builder, add multiple PDF files to generate the new merged PDF by merging the original
files.
• Create a new variable named "MergedPDF" to store the newly generated PDF for clarity.
• This activity will generate an autogenerated variable to hold the newly generated PDF, but creating
a new variable with a proper naming convention enhances understanding.
• Upload the merged PDF
- Use Upload Storage File activity to upload the merged PDF to the PDF Outputs bucket.
• Specify the file to be uploaded as the "MergedPDF" variable.
• Define the path where you want to upload the file in the storage bucket as "MergedPDF.pdf"
- Activities:
- Merge PDF files: This activity merges multiple PDF files into a single PDF document.
- Upload Storage File: This activity uploads the merged PDF to the specified storage bucket.
12. 12
• Summary and Conclusion
• Summary:
- PDF automation in UiPath Studio Web enables efficient handling of PDF files.
- Various activities facilitate text extraction, image extraction, page manipulation, and security
enhancements.
- Orchestrator integration simplifies storage and management of PDF files.
• Conclusion:
- With PDF automation capabilities, UiPath Studio Web empowers users to streamline PDF processing
tasks.
- Explore the range of PDF activities to enhance document management workflows and increase
productivity.