SlideShare a Scribd company logo
Sri Krishnamurthy
Fall 2023
Projects
Project 1
PDF Summary Tool: Students are tasked with creating
a Streamlit app that can summarize PDF documents.
They must choose between using nougat or pypdf
libraries to process PDFs from the SEC. The app should
allow users to select the library, test various PDFs,
provide pros/cons for each tool, and recommend one.
Additionally, students need to create and integrate
architectural diagrams of the project within Streamlit.
Data Quality Evaluation Tool: This part involves
building a Streamlit tool using the Freddie Mac single-
family dataset. The tool, designed for data quality
evaluation, should allow users to upload CSV/XLS
fi
les
and specify their type (Origination/Monthly
Performance). The tool will use pandaspro
fi
ling to
summarize data and greatexpectations to validate data
schema, integrity, and completeness. Architecture
Tools Used
• Streamlit
• Nougat or PyPDF libraries
• pandas-profiling
• greatexpectations
• Diagrams tool for architecture
Data Engineering and building tools to summarize
SEC and Freddie Mac datasets
Project 2
A system using Large Language Models to
summarize PDF documents from the SEC website.
The project for the Big Data and Intelligent Analytics
graduate course, as detailed in Assignment 2, involves
developing a tool for analysts to load PDF documents
and obtain summaries.
The project includes evaluating nougat and pypdf
libraries for processing PDFs from the SEC, replicating
a demo from the Open AI cookbook, and creating
Jupyter notebooks that can handle SEC PDF
documents. Additionally, students are tasked with
designing fast APIs for a Streamlit app, updating the
app with new functionalities, and revising design
documents and architectural diagrams to re
fl
ect the
updates.
Tools Used
• Streamlit
• Nougat or PyPDF libraries
• FAST API
• OPENAI APIs
• greatexpectations
• Diagrams tool for architecture
visualization
This project focused on
automating the creation of
embeddings and populating a
vector database. Key components
include:
Automating Embedding
Creation and Database
Population:
Air
fl
ow Pipelines: Two distinct
Air
fl
ow pipelines for data
acquisition, embedding
generation, and inserting records
into Pinecone vector database
using SEC PDF
fi
les.
Data Processing and Validation:
Implement data validation,
generate embeddings, and save
fi
le extracts.
Client-Facing Application
Development:
FastAPI and Streamlit: Develop
a user registration and login
system with JWT authentication.
Utilize a SQL database for storing
user credentials and application
logs.
Streamlit for User Interface:
Create a secure login page, a
question-answering interface, and
implement a search mechanism
using Pinecone vector database.
Deployment: Containerize each
microservice and deploy on a
public cloud platform.
Project 3
Using LLMs and RAG for document summarization of
SEC documents
Tools Used
• Airflow
• Pinecone
• FastAPI
• JWT (JSON Web Token)
• SQL Database
• Streamlit
• Docker for containerization
Project 4
Using LLMs to interact with
Snowflake using natural language
Data Engineering with Snowpark Python: Students
individually reproduce steps in creating data pipelines
with Snowpark Python, showcasing their work in a
forked repository.
Dataset Analysis: Teams select datasets from
Snow
fl
ake's marketplace, creating thematic stories and
Proof of Concept (POC) to address speci
fi
c problems.
They design architectural diagrams and implement SQL
processes and User-De
fi
ned Functions, integrating Git
actions for deployment.
Streamlit and OpenAI Integration: The project
involves connecting Snow
fl
ake with Streamlit for
analytics, developing a text-based SQL query feature
using natural language processing, and integrating
OpenAI services for query generation and re
fi
nement.
Tools Used
• Snowpark Python
• Snowflake Marketplace
• Streamlit
• OpenAI Services
• SQL Database Management
The project involves a thorough review
of the existing architecture
(Assignment 3) and its redesign using
two distinct approaches:
Open Source Components: Utilizing
primarily open-source tools like
Huggingface, LLAMA from Meta,
Amazon Bedrock, etc. The focus is on
creating a
fl
exible and customizable
stack that aligns with the dynamic
needs of the enterprise.
Enterprise Alternatives to OpenAI
Stack: Incorporating enterprise
solutions such as Google Bard,
Anthropic, Cohere, Perplexity, etc. This
approach is geared towards leveraging
the robust and reliable frameworks
o
ff
ered by leading tech organizations.
Architecture Design: Both use cases
will have detailed architecture
diagrams showcasing preparation
pipelines and inference aspects.
A comparison of the technologies in
terms of hosting and as-a-service
capabilities.
Technology Suitability Analysis:
Justi
fi
cation of selected technologies
based on application suitability.
Evaluation of scalability, reliability,
and performance metrics.
Cost Analysis: Detailed breakdown of
fi
xed and variable costs for both
architectures.
Analysis includes hosting, annual
licenses, maintenance, API access,
and use-case speci
fi
c costs (e.g.,
PDF processing).
Comparative study of cost
structures between the original and
new architectures.
Project 5
Project redesign and rearchitecture
Tools Used
Huggingface: For machine learning and natural language
processing tasks.
LLAMA from Meta: A language model for various analytical
tasks.
Amazon Bedrock: For data management and analytics
infrastructure.
Enterprise Components:
Google Bard: AI-driven data analysis and predictive
modeling.
Anthropic: Advanced AI solutions for complex data tasks.
Cohere: Provides tools for natural language understanding.

More Related Content

Similar to Big Data projects.pdf

Juan Baquera
Juan BaqueraJuan Baquera
Juan Baquera
Juan Baquera
 
Supreet Resume
Supreet ResumeSupreet Resume
Supreet Resume
supreet khurana
 
Case study for communication social portal with share point implementation
Case study for communication social portal with share point implementationCase study for communication social portal with share point implementation
Case study for communication social portal with share point implementation
Mike Taylor
 
peeyush_resume
peeyush_resumepeeyush_resume
peeyush_resume
Peeyush Pandey
 
Aman kaur gandhi
Aman kaur gandhiAman kaur gandhi
Aman kaur gandhi
Aman Kaur Gandhi
 
Aman kaur gandhi
Aman kaur gandhiAman kaur gandhi
Aman kaur gandhi
Aman Kaur Gandhi
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
Portfolio
PortfolioPortfolio
Portfolio
jeanux
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
VINOD_6yrs
VINOD_6yrsVINOD_6yrs
VINOD_6yrs
Kona Kumar
 
Zakir_Hussain_cv
Zakir_Hussain_cvZakir_Hussain_cv
Zakir_Hussain_cv
zakir hussain
 
CustomerCopy
CustomerCopyCustomerCopy
CustomerCopy
mohit behl
 
Resume_Md ZakirHussain
Resume_Md ZakirHussainResume_Md ZakirHussain
Resume_Md ZakirHussain
zakir hussain
 
ZakirHussain
ZakirHussainZakirHussain
ZakirHussain
zakir hussain
 
Shabarish kesa resume_new
Shabarish kesa resume_newShabarish kesa resume_new
Shabarish kesa resume_new
shabarish shabbi
 
Sam segal resume
Sam segal resumeSam segal resume
Sam segal resume
samuel segal
 
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
sajedulislam
 
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
sajedulislam
 
JCommerce – success stories
JCommerce – success storiesJCommerce – success stories
JCommerce – success stories
JCommerce
 

Similar to Big Data projects.pdf (20)

Juan Baquera
Juan BaqueraJuan Baquera
Juan Baquera
 
Supreet Resume
Supreet ResumeSupreet Resume
Supreet Resume
 
Case study for communication social portal with share point implementation
Case study for communication social portal with share point implementationCase study for communication social portal with share point implementation
Case study for communication social portal with share point implementation
 
peeyush_resume
peeyush_resumepeeyush_resume
peeyush_resume
 
Aman kaur gandhi
Aman kaur gandhiAman kaur gandhi
Aman kaur gandhi
 
Aman kaur gandhi
Aman kaur gandhiAman kaur gandhi
Aman kaur gandhi
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Portfolio
PortfolioPortfolio
Portfolio
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
VINOD_6yrs
VINOD_6yrsVINOD_6yrs
VINOD_6yrs
 
Zakir_Hussain_cv
Zakir_Hussain_cvZakir_Hussain_cv
Zakir_Hussain_cv
 
CustomerCopy
CustomerCopyCustomerCopy
CustomerCopy
 
Resume_Md ZakirHussain
Resume_Md ZakirHussainResume_Md ZakirHussain
Resume_Md ZakirHussain
 
ZakirHussain
ZakirHussainZakirHussain
ZakirHussain
 
Shabarish kesa resume_new
Shabarish kesa resume_newShabarish kesa resume_new
Shabarish kesa resume_new
 
Sam segal resume
Sam segal resumeSam segal resume
Sam segal resume
 
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
 
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
 
JCommerce – success stories
JCommerce – success storiesJCommerce – success stories
JCommerce – success stories
 

Recently uploaded

An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
edwin408357
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
Prakhyath Rai
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
morris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdfmorris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdf
ycwu0509
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 

Recently uploaded (20)

An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
morris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdfmorris_worm_intro_and_source_code_analysis_.pdf
morris_worm_intro_and_source_code_analysis_.pdf
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 

Big Data projects.pdf

  • 2. Project 1 PDF Summary Tool: Students are tasked with creating a Streamlit app that can summarize PDF documents. They must choose between using nougat or pypdf libraries to process PDFs from the SEC. The app should allow users to select the library, test various PDFs, provide pros/cons for each tool, and recommend one. Additionally, students need to create and integrate architectural diagrams of the project within Streamlit. Data Quality Evaluation Tool: This part involves building a Streamlit tool using the Freddie Mac single- family dataset. The tool, designed for data quality evaluation, should allow users to upload CSV/XLS fi les and specify their type (Origination/Monthly Performance). The tool will use pandaspro fi ling to summarize data and greatexpectations to validate data schema, integrity, and completeness. Architecture Tools Used • Streamlit • Nougat or PyPDF libraries • pandas-profiling • greatexpectations • Diagrams tool for architecture Data Engineering and building tools to summarize SEC and Freddie Mac datasets
  • 3. Project 2 A system using Large Language Models to summarize PDF documents from the SEC website. The project for the Big Data and Intelligent Analytics graduate course, as detailed in Assignment 2, involves developing a tool for analysts to load PDF documents and obtain summaries. The project includes evaluating nougat and pypdf libraries for processing PDFs from the SEC, replicating a demo from the Open AI cookbook, and creating Jupyter notebooks that can handle SEC PDF documents. Additionally, students are tasked with designing fast APIs for a Streamlit app, updating the app with new functionalities, and revising design documents and architectural diagrams to re fl ect the updates. Tools Used • Streamlit • Nougat or PyPDF libraries • FAST API • OPENAI APIs • greatexpectations • Diagrams tool for architecture visualization
  • 4. This project focused on automating the creation of embeddings and populating a vector database. Key components include: Automating Embedding Creation and Database Population: Air fl ow Pipelines: Two distinct Air fl ow pipelines for data acquisition, embedding generation, and inserting records into Pinecone vector database using SEC PDF fi les. Data Processing and Validation: Implement data validation, generate embeddings, and save fi le extracts. Client-Facing Application Development: FastAPI and Streamlit: Develop a user registration and login system with JWT authentication. Utilize a SQL database for storing user credentials and application logs. Streamlit for User Interface: Create a secure login page, a question-answering interface, and implement a search mechanism using Pinecone vector database. Deployment: Containerize each microservice and deploy on a public cloud platform. Project 3 Using LLMs and RAG for document summarization of SEC documents Tools Used • Airflow • Pinecone • FastAPI • JWT (JSON Web Token) • SQL Database • Streamlit • Docker for containerization
  • 5. Project 4 Using LLMs to interact with Snowflake using natural language Data Engineering with Snowpark Python: Students individually reproduce steps in creating data pipelines with Snowpark Python, showcasing their work in a forked repository. Dataset Analysis: Teams select datasets from Snow fl ake's marketplace, creating thematic stories and Proof of Concept (POC) to address speci fi c problems. They design architectural diagrams and implement SQL processes and User-De fi ned Functions, integrating Git actions for deployment. Streamlit and OpenAI Integration: The project involves connecting Snow fl ake with Streamlit for analytics, developing a text-based SQL query feature using natural language processing, and integrating OpenAI services for query generation and re fi nement. Tools Used • Snowpark Python • Snowflake Marketplace • Streamlit • OpenAI Services • SQL Database Management
  • 6. The project involves a thorough review of the existing architecture (Assignment 3) and its redesign using two distinct approaches: Open Source Components: Utilizing primarily open-source tools like Huggingface, LLAMA from Meta, Amazon Bedrock, etc. The focus is on creating a fl exible and customizable stack that aligns with the dynamic needs of the enterprise. Enterprise Alternatives to OpenAI Stack: Incorporating enterprise solutions such as Google Bard, Anthropic, Cohere, Perplexity, etc. This approach is geared towards leveraging the robust and reliable frameworks o ff ered by leading tech organizations. Architecture Design: Both use cases will have detailed architecture diagrams showcasing preparation pipelines and inference aspects. A comparison of the technologies in terms of hosting and as-a-service capabilities. Technology Suitability Analysis: Justi fi cation of selected technologies based on application suitability. Evaluation of scalability, reliability, and performance metrics. Cost Analysis: Detailed breakdown of fi xed and variable costs for both architectures. Analysis includes hosting, annual licenses, maintenance, API access, and use-case speci fi c costs (e.g., PDF processing). Comparative study of cost structures between the original and new architectures. Project 5 Project redesign and rearchitecture Tools Used Huggingface: For machine learning and natural language processing tasks. LLAMA from Meta: A language model for various analytical tasks. Amazon Bedrock: For data management and analytics infrastructure. Enterprise Components: Google Bard: AI-driven data analysis and predictive modeling. Anthropic: Advanced AI solutions for complex data tasks. Cohere: Provides tools for natural language understanding.