1
Aniruddha Chakrabarti
Cloud Consulting & Architecture Lead
Accenture - Cloud First, Data & AI
Agenda
1 Vector Databases, About Pinecone
2 Pinecone Organization, Project and Client libraries
3 Index and Collection
4 Data – Insert, Update, Delete
5 Data - Fetch & Search
6 Other
What is a Vector Database
• Applications that involve large language models, generative AI, and semantic search rely on vector
embeddings, a type of data that represents semantic information.
• Embeddings are generated by AI models (such as Large Language Models) and have a large number
of attributes or features, making their representation challenging to manage. In the context of AI and
machine learning, these features represent different dimensions of the data that are essential for
understanding patterns, relationships, and underlying structures
• Traditional scalar-based databases can’t keep up with the complexity and scale of such data, making
it difficult to extract insights and perform real-time analysis.
• Vector databases like Pinecone offer optimized storage and querying capabilities for embeddings.
• First, use the embedding model to create vector embeddings for
the content we want to index.
• The vector embedding is inserted into the vector database, with some
reference to the original content the embedding was created from.
• When the application issues a query, we use the same embedding
model to create embeddings for the query, and use those embeddings
to query the database for similar vector embeddings. And as
mentioned before, those similar embeddings are associated with the
original content that was used to create them.
About Pinecone
• Pinecone is a managed, cloud-native, vector database/ DBaaS.
• Few other examples of vector databases include Qdrant, Milvus, Chroma, Weaviate etc.
Other databases like Postgres, Redis and Sqlite provides extensions to handle Vector data.
• Pinecone currently is one of the most popular Vector Database.
• Suitable for storing vector embeddings (also called text embeddings) that provide long
term memory to Large Language Models. large language models, generative AI, and
semantic search rely on vector embeddings
• Vector embedding or text embeddings are a type of data that represents semantic
information. This information allows AI applications to gain understanding and maintain a
long-term memory that they can draw upon when executing complex tasks
• It helps in storing, querying, filtering and searching vector data.
• Operations are low latency and scale to billions of vectors
• Pinecone is a NoSQL datastore and so it’s eventually consistent
• Provides a mongo like API
Pinecone object hierarchy
• Organization – An organization in Pinecone
is a higher-level construct that contains
multiple projects. So, it is a set of projects
that use the same billing.
• Project – Each organization contains one or
more projects that share the same
organization owners and billing settings.
• Index – An index is similar to a table in
relational databases. It contains records that
store vectors.
• Record – A record consists of an id, vector
data (array of float/ numbers) and metadata
in the form of key value pairs.
• Collection - Used to backup an Index (along
with associated data)
Organization
Project N
Project 2
Project 1
Index N
Index 1
Record 1 (Id, Vector, Metadata)
Record N (Id, Vector, Metadata)
Collection 1 Collection N
Billing Roles - Organization
Owners and Users
Pinecone Client in Python
• Install Pinecone client
pip install pinecone-client
• Import Pinecone library and initialize the client by passing Pinecone api key and name of the Pinecone
environment.
import pinecone
import os
pinecone_api_key = os.environ.get('PINECONE_API_KEY')
env = os.environ.get('PINECONE_ENVIRONMENT')
pinecone.init(api_key=pinecone_api_key,
environment=env)
pinecone.version()
VersionResponse(server='2.0.11', client='2.2.4')
Organization in Pinecone
• A Pinecone organization is a set of projects that use the same billing.
• When an account is created in Pinecone a project gets created. Additional projects could
be created from Settings > Projects.
Project in Pinecone
• Each Pinecone project contains a number of indexes and users. Only a user who belongs to the
project can access the indexes in that project. Each project also has at least one project owner.
• Create a project in the Pinecone web console by specifying a name, Cloud providers (GCP, AWS,
Azure), Deployment location, Pod limit (maximum number of pods that can be used in total)
• whoami method could be used to retrieve the project id
pinecone.whoami()
WhoAmIResponse(username='bd6e4bc', user_label='default', projectname='f638a37’)
• Project name could be also retrieved using Config property. Apart from Project, environment, log
level, api key etc could be retrieved
pinecone.Config.PROJECT_NAME
pinecone.Config.ENVIRONMENT
pinecone.Config.API_KEY
Index
• An index is the highest-level organizational
unit of vector data in Pinecone (like a table in
relational database)
• Pinecone indexes store records – each
record can contain vector data, which is
array of floating-point numbers / decimals.
• It accepts and stores vectors, serves queries
over the vectors it contains, and does other
vector operations over its contents.
• A Pinecone Project can contain many
Indexes. Each Index contains multiple
records.
Organization
Project N
Project 2
Project 1
Index N
Index 1
Record 1 (Vector)
Record N (Vector)
Record
• A Pinecone Project can
contain many Indexes. Each
Index contains multiple
records.
• A record contains an
• ID,
• Values which is a vector
or array of float/
numbers
• Additional Metadata
(optional).
• As Pinecone is a NoSQL
vector database, defining
the schema is not reqd.
Organization
Project N
Project 2
Project 1
Index N
Index 1
1 [ 1.0,2.0, …. 10.0] {“key1”: “val1”, “key2”: “val2”}
2 [ 1.0,2.0, …. 10.0] {“key1”: “val1”, “key2”: “val2”}
N [ 1.0,2.0, …. 10.0] {“key1”: “val1”, “key2”: “val2”}
ID Values Metadata
Index – creating
• Create an index by passing index name and size/ dimension of the vector to be stored in index.
pinecone.create_index(name="first-index", dimension=5)
• Additional parameters could be passed like distance metric type and no of shards.
pinecone.create_index(name="second-index", dimension=5, metric="cosine", shards=1)
Index – creating
• Distance metric could be of three type –
o Euclidean - This is used to calculate the distance between two data points in a plane. It is one of the most
commonly used distance metric.
o Cosine - This is often used to find similarities between different documents. This is the default value. The
advantage is that the scores are normalized to [-1,1] range.
o Dotproduct - This is used to multiply two vectors. You can use it to tell us how similar the two vectors are. The
more positive the answer is, the closer the two vectors are in terms of their directions.
• Distance metric could be of three type – No of pods and pod type also could be specified –
pinecone.create_index(name="third-index", dimension=10, metric="cosine", shards=3,
pods=5, pod_type="p2")
Index – listing index, getting details and deleting
• List Pinecone indexes
pinecone.list_indexes()
['first-index’]
• Get details of an index
pinecone.describe_index("second-index")
IndexDescription(name='second-index', metric='cosine', replicas=1, dimension=5.0,
shards=1, pods=1, pod_type='starter', status={'ready': True, 'state': 'Ready'},
metadata_config=None, source_collection=‘’)
• Delete an index
pinecone.delete_index("first-index")
Index – scaling and configuring
• An index could be scaled up or down -
pinecone.scale_index(name='second-index', replicas=3)
• An index could be updated or reconfigured to a different pod type and replica count.
pinecone.configure_index(name='second-index', pod_type=“s1", replicas = 5)
• There are three types of pods available –
o s1 – storage optimized pod. provide large storage capacity and lower overall costs with slightly higher query
latencies than p1 pods. They are ideal for very large indexes with moderate or relaxed latency requirements.
o p1 - performance-optimized pods provide very low query latencies, but hold fewer vectors per pod than s1
pods. They are ideal for applications with low latency requirements (<100ms).
o p2 - p2 pod type provides greater query throughput with lower latency. For vectors with fewer than 128
dimension and queries where topK is less than 50, p2 pods support up to 200 QPS per replica and return
queries in less than 10ms. This means that query throughput and latency are better than s1 and p1.
o Starter – used in free plan.
• Get statistics about the index -
index.describe_index_stats()
{'dimension': 5, 'index_fullness': 9e-05, 'namespaces': {'': {'vector_count': 9}},
'total_vector_count': 9}
Insert data
• To insert, update, delete or perform any operation, Get a reference to the index created before -
index = pinecone.Index("second-index")
• Insert a single record by using the upsert method. The record contain Id, vector embeddings and
optional metadata -
index.upsert([("hello world", [1.0, 2.234, 3.34, 5.6, 7.8])])
• Insert multiple records using the same operations passing all the records as array -
index.upsert(
[
("Bangalore", [1.0, 2.234, 3.34, 5.6, 7.8]),
("Kolkata", [2.0, 1.234, 3.34, 5.6, 7.8]),
("Chennai", [3.0, 5.234, 3.34, 5.6, 7.8]),
("Mumbai", [4.0, 6.234, 3.34, 5.6, 7.8])
])
• Insert records (multiple) with metadata
index.upsert(
[
("Delhi", [1.0, 2.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type": "metro"}),
("Pune", [2.0, 1.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type": "non-metro"})
])
Update data – partial and full update
• Pinecone supports Full update and partial update of records.
• Full Update allows updating both vector values and metadata, while partial update allowing either
vector values or metadata
• To insert, update, delete or perform any operation, Get a reference to the index created before -
index = pinecone.Index("second-index")
• To partially update the record, use the update method
Following code update the vector values of the record -
index.update(id="hello world", values=[2.0, 3.1, 6.4, 9.6, 11.8])
Following code update the metadata the record -
index.update(id="hello-world", metadata={"city1":"Blore", "city2":"Kolkata"})
• To fully update the record use the same upsert method used earlier to insert the data -
index.upsert([("hello-world", [10.0, 20.1, 31.4, 55.6, 75.8], {"city1":"Blore",
"city2":"Kolkata", "city3":"Delhi"})])
Backup data using Collection
• Collections are used to backup an Index. A collection is a static copy of your index that only
consumes storage
• Create a collection by specifying name of the collection and the name of the index -
pinecone.create_collection(name="bakcup_collection", source=“first-index")
pinecone.list_collections()
['bakcup-collection’]
pinecone.describe_collection("bakcup-collection")
Collection – listing, get details and deleting
• All existing collections could be easily listed -
pinecone.list_collections()
['bakcup-collection’]
• All existing collections could be easily listed -
pinecone.describe_collection("bakcup-collection")
• Delete collection -
pinecone.delete_collection("bakcup-collection")
Dense vector vs Sparse data
• Pinecone supports both Dense vector and Sparse vector.
• Till now, we have only played with Dense vectors.
Index 1
1 [ 1.0,2.0, …. 10.0] indices [1,2], values [10,0, 20.5] {“key1”: “val1”, “key2”: “val2”}
2 [ 1.0,2.0, …. 10.0] indices [1,2], values [10,0, 20.5] {“key1”: “val1”, “key2”: “val2”}
N [ 1.0,2.0, …. 10.0] indices [1,2], values [10,0, 20.5] {“key1”: “val1”, “key2”: “val2”}
ID Dense Vector Sparse Vector Metadata
both Dense vector and Sparse vector could be part of a record
Inserting sparse data
• Sparse vector values can be upserted alongside dense vector values -
index.upsert(vectors=[{'id': 'id1', 'values': [0.1, 0.2, 0.3, 0.4, 0.5],
'sparse_values':{ 'indices':[1,2], 'values': [10.0, 20.5] } }])
• Note that you cannot upsert a record with sparse vector values without dense vector values
index.upsert(vectors=[{'id': 'id1', 'sparse_values':{ 'indices':[1,2], 'values':
[10.0, 20.5] } }])
ValueError: Vector dictionary is missing required fields: ['values']
Fetch data, and update data
• Fetch data by passing ids of the record
index.fetch(ids=['Bangalore', 'Kolkata’])
{'namespace': '',
'vectors': {'Bangalore': {'id': 'Bangalore', 'metadata': {}, 'values': [1.0, 2.234,
3.34, 5.6, 7.8]},
'Kolkata': {'id': 'Kolkata', 'metadata': {}, 'values': [2.0, 1.234,
3.34, 5.6, 7.8]}}}
• Update a record – both values (vector) and metadata could be updated
index.update(id='Bangalore', set_metadata={"type":"city", "sub-type":"non-metro"},
values=[1.0, 2.0, 3.0, 4.0, 5.0])
index.fetch(ids=['Bangalore’])
{'namespace': '',
'vectors': {'Bangalore': {'id': 'Bangalore', 'metadata': {'sub-type': 'non-metro',
'type': 'city'}, 'values': [1.0, 2.0, 3.0, 4.0, 5.0]}}}
Query data
• Query data by vector match
index.query(vector=[1.0, 2.0, 3.0, 5, 7.0], top_k=3, include_values=True)
{'matches': [{'id': 'Bangalore', 'score': 0.999296784, 'values': [1.0, 2.0, 3.0,
5.0, 7.0]},
{'id': 'Delhi', 'score': 0.997676671, 'values': [1.0, 2.234, 3.34,
5.6, 7.8]},
{'id': 'Den Haag', 'score': 0.997676671, 'values': [1.0, 2.234, 3.34,
5.6, 7.8]}], 'namespace': ‘’}
• Apart from id and values (vector data), metadata of each matched record also could be retrieved
index.query(vector=[1.0, 2.0, 3.0, 5, 7.0], top_k=3, include_values=True,
include_metadata=True)
Namespace
• Pinecone allows you to partition the records in an index into namespaces. Queries and other
operations are then limited to one namespace, so different requests can search different subsets of
your index.
• A new namespace could be created by inserting a record to an index by specifying a new namespace
-
index.upsert(
vectors = [
("Howrah", [1.0, 2.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type":
"metro"}),
("Siliguri", [2.0, 1.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type":
"non-metro"})
], namespace='my-first-namespace')
• By default each index contains a single default index.

Pinecone Vector Database.pdf

  • 1.
    1 Aniruddha Chakrabarti Cloud Consulting& Architecture Lead Accenture - Cloud First, Data & AI
  • 2.
    Agenda 1 Vector Databases,About Pinecone 2 Pinecone Organization, Project and Client libraries 3 Index and Collection 4 Data – Insert, Update, Delete 5 Data - Fetch & Search 6 Other
  • 3.
    What is aVector Database • Applications that involve large language models, generative AI, and semantic search rely on vector embeddings, a type of data that represents semantic information. • Embeddings are generated by AI models (such as Large Language Models) and have a large number of attributes or features, making their representation challenging to manage. In the context of AI and machine learning, these features represent different dimensions of the data that are essential for understanding patterns, relationships, and underlying structures • Traditional scalar-based databases can’t keep up with the complexity and scale of such data, making it difficult to extract insights and perform real-time analysis. • Vector databases like Pinecone offer optimized storage and querying capabilities for embeddings. • First, use the embedding model to create vector embeddings for the content we want to index. • The vector embedding is inserted into the vector database, with some reference to the original content the embedding was created from. • When the application issues a query, we use the same embedding model to create embeddings for the query, and use those embeddings to query the database for similar vector embeddings. And as mentioned before, those similar embeddings are associated with the original content that was used to create them.
  • 4.
    About Pinecone • Pineconeis a managed, cloud-native, vector database/ DBaaS. • Few other examples of vector databases include Qdrant, Milvus, Chroma, Weaviate etc. Other databases like Postgres, Redis and Sqlite provides extensions to handle Vector data. • Pinecone currently is one of the most popular Vector Database. • Suitable for storing vector embeddings (also called text embeddings) that provide long term memory to Large Language Models. large language models, generative AI, and semantic search rely on vector embeddings • Vector embedding or text embeddings are a type of data that represents semantic information. This information allows AI applications to gain understanding and maintain a long-term memory that they can draw upon when executing complex tasks • It helps in storing, querying, filtering and searching vector data. • Operations are low latency and scale to billions of vectors • Pinecone is a NoSQL datastore and so it’s eventually consistent • Provides a mongo like API
  • 5.
    Pinecone object hierarchy •Organization – An organization in Pinecone is a higher-level construct that contains multiple projects. So, it is a set of projects that use the same billing. • Project – Each organization contains one or more projects that share the same organization owners and billing settings. • Index – An index is similar to a table in relational databases. It contains records that store vectors. • Record – A record consists of an id, vector data (array of float/ numbers) and metadata in the form of key value pairs. • Collection - Used to backup an Index (along with associated data) Organization Project N Project 2 Project 1 Index N Index 1 Record 1 (Id, Vector, Metadata) Record N (Id, Vector, Metadata) Collection 1 Collection N Billing Roles - Organization Owners and Users
  • 6.
    Pinecone Client inPython • Install Pinecone client pip install pinecone-client • Import Pinecone library and initialize the client by passing Pinecone api key and name of the Pinecone environment. import pinecone import os pinecone_api_key = os.environ.get('PINECONE_API_KEY') env = os.environ.get('PINECONE_ENVIRONMENT') pinecone.init(api_key=pinecone_api_key, environment=env) pinecone.version() VersionResponse(server='2.0.11', client='2.2.4')
  • 7.
    Organization in Pinecone •A Pinecone organization is a set of projects that use the same billing. • When an account is created in Pinecone a project gets created. Additional projects could be created from Settings > Projects.
  • 8.
    Project in Pinecone •Each Pinecone project contains a number of indexes and users. Only a user who belongs to the project can access the indexes in that project. Each project also has at least one project owner. • Create a project in the Pinecone web console by specifying a name, Cloud providers (GCP, AWS, Azure), Deployment location, Pod limit (maximum number of pods that can be used in total) • whoami method could be used to retrieve the project id pinecone.whoami() WhoAmIResponse(username='bd6e4bc', user_label='default', projectname='f638a37’) • Project name could be also retrieved using Config property. Apart from Project, environment, log level, api key etc could be retrieved pinecone.Config.PROJECT_NAME pinecone.Config.ENVIRONMENT pinecone.Config.API_KEY
  • 9.
    Index • An indexis the highest-level organizational unit of vector data in Pinecone (like a table in relational database) • Pinecone indexes store records – each record can contain vector data, which is array of floating-point numbers / decimals. • It accepts and stores vectors, serves queries over the vectors it contains, and does other vector operations over its contents. • A Pinecone Project can contain many Indexes. Each Index contains multiple records. Organization Project N Project 2 Project 1 Index N Index 1 Record 1 (Vector) Record N (Vector)
  • 10.
    Record • A PineconeProject can contain many Indexes. Each Index contains multiple records. • A record contains an • ID, • Values which is a vector or array of float/ numbers • Additional Metadata (optional). • As Pinecone is a NoSQL vector database, defining the schema is not reqd. Organization Project N Project 2 Project 1 Index N Index 1 1 [ 1.0,2.0, …. 10.0] {“key1”: “val1”, “key2”: “val2”} 2 [ 1.0,2.0, …. 10.0] {“key1”: “val1”, “key2”: “val2”} N [ 1.0,2.0, …. 10.0] {“key1”: “val1”, “key2”: “val2”} ID Values Metadata
  • 11.
    Index – creating •Create an index by passing index name and size/ dimension of the vector to be stored in index. pinecone.create_index(name="first-index", dimension=5) • Additional parameters could be passed like distance metric type and no of shards. pinecone.create_index(name="second-index", dimension=5, metric="cosine", shards=1)
  • 12.
    Index – creating •Distance metric could be of three type – o Euclidean - This is used to calculate the distance between two data points in a plane. It is one of the most commonly used distance metric. o Cosine - This is often used to find similarities between different documents. This is the default value. The advantage is that the scores are normalized to [-1,1] range. o Dotproduct - This is used to multiply two vectors. You can use it to tell us how similar the two vectors are. The more positive the answer is, the closer the two vectors are in terms of their directions. • Distance metric could be of three type – No of pods and pod type also could be specified – pinecone.create_index(name="third-index", dimension=10, metric="cosine", shards=3, pods=5, pod_type="p2")
  • 13.
    Index – listingindex, getting details and deleting • List Pinecone indexes pinecone.list_indexes() ['first-index’] • Get details of an index pinecone.describe_index("second-index") IndexDescription(name='second-index', metric='cosine', replicas=1, dimension=5.0, shards=1, pods=1, pod_type='starter', status={'ready': True, 'state': 'Ready'}, metadata_config=None, source_collection=‘’) • Delete an index pinecone.delete_index("first-index")
  • 14.
    Index – scalingand configuring • An index could be scaled up or down - pinecone.scale_index(name='second-index', replicas=3) • An index could be updated or reconfigured to a different pod type and replica count. pinecone.configure_index(name='second-index', pod_type=“s1", replicas = 5) • There are three types of pods available – o s1 – storage optimized pod. provide large storage capacity and lower overall costs with slightly higher query latencies than p1 pods. They are ideal for very large indexes with moderate or relaxed latency requirements. o p1 - performance-optimized pods provide very low query latencies, but hold fewer vectors per pod than s1 pods. They are ideal for applications with low latency requirements (<100ms). o p2 - p2 pod type provides greater query throughput with lower latency. For vectors with fewer than 128 dimension and queries where topK is less than 50, p2 pods support up to 200 QPS per replica and return queries in less than 10ms. This means that query throughput and latency are better than s1 and p1. o Starter – used in free plan. • Get statistics about the index - index.describe_index_stats() {'dimension': 5, 'index_fullness': 9e-05, 'namespaces': {'': {'vector_count': 9}}, 'total_vector_count': 9}
  • 15.
    Insert data • Toinsert, update, delete or perform any operation, Get a reference to the index created before - index = pinecone.Index("second-index") • Insert a single record by using the upsert method. The record contain Id, vector embeddings and optional metadata - index.upsert([("hello world", [1.0, 2.234, 3.34, 5.6, 7.8])]) • Insert multiple records using the same operations passing all the records as array - index.upsert( [ ("Bangalore", [1.0, 2.234, 3.34, 5.6, 7.8]), ("Kolkata", [2.0, 1.234, 3.34, 5.6, 7.8]), ("Chennai", [3.0, 5.234, 3.34, 5.6, 7.8]), ("Mumbai", [4.0, 6.234, 3.34, 5.6, 7.8]) ]) • Insert records (multiple) with metadata index.upsert( [ ("Delhi", [1.0, 2.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type": "metro"}), ("Pune", [2.0, 1.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type": "non-metro"}) ])
  • 16.
    Update data –partial and full update • Pinecone supports Full update and partial update of records. • Full Update allows updating both vector values and metadata, while partial update allowing either vector values or metadata • To insert, update, delete or perform any operation, Get a reference to the index created before - index = pinecone.Index("second-index") • To partially update the record, use the update method Following code update the vector values of the record - index.update(id="hello world", values=[2.0, 3.1, 6.4, 9.6, 11.8]) Following code update the metadata the record - index.update(id="hello-world", metadata={"city1":"Blore", "city2":"Kolkata"}) • To fully update the record use the same upsert method used earlier to insert the data - index.upsert([("hello-world", [10.0, 20.1, 31.4, 55.6, 75.8], {"city1":"Blore", "city2":"Kolkata", "city3":"Delhi"})])
  • 17.
    Backup data usingCollection • Collections are used to backup an Index. A collection is a static copy of your index that only consumes storage • Create a collection by specifying name of the collection and the name of the index - pinecone.create_collection(name="bakcup_collection", source=“first-index") pinecone.list_collections() ['bakcup-collection’] pinecone.describe_collection("bakcup-collection")
  • 18.
    Collection – listing,get details and deleting • All existing collections could be easily listed - pinecone.list_collections() ['bakcup-collection’] • All existing collections could be easily listed - pinecone.describe_collection("bakcup-collection") • Delete collection - pinecone.delete_collection("bakcup-collection")
  • 19.
    Dense vector vsSparse data • Pinecone supports both Dense vector and Sparse vector. • Till now, we have only played with Dense vectors. Index 1 1 [ 1.0,2.0, …. 10.0] indices [1,2], values [10,0, 20.5] {“key1”: “val1”, “key2”: “val2”} 2 [ 1.0,2.0, …. 10.0] indices [1,2], values [10,0, 20.5] {“key1”: “val1”, “key2”: “val2”} N [ 1.0,2.0, …. 10.0] indices [1,2], values [10,0, 20.5] {“key1”: “val1”, “key2”: “val2”} ID Dense Vector Sparse Vector Metadata both Dense vector and Sparse vector could be part of a record
  • 20.
    Inserting sparse data •Sparse vector values can be upserted alongside dense vector values - index.upsert(vectors=[{'id': 'id1', 'values': [0.1, 0.2, 0.3, 0.4, 0.5], 'sparse_values':{ 'indices':[1,2], 'values': [10.0, 20.5] } }]) • Note that you cannot upsert a record with sparse vector values without dense vector values index.upsert(vectors=[{'id': 'id1', 'sparse_values':{ 'indices':[1,2], 'values': [10.0, 20.5] } }]) ValueError: Vector dictionary is missing required fields: ['values']
  • 21.
    Fetch data, andupdate data • Fetch data by passing ids of the record index.fetch(ids=['Bangalore', 'Kolkata’]) {'namespace': '', 'vectors': {'Bangalore': {'id': 'Bangalore', 'metadata': {}, 'values': [1.0, 2.234, 3.34, 5.6, 7.8]}, 'Kolkata': {'id': 'Kolkata', 'metadata': {}, 'values': [2.0, 1.234, 3.34, 5.6, 7.8]}}} • Update a record – both values (vector) and metadata could be updated index.update(id='Bangalore', set_metadata={"type":"city", "sub-type":"non-metro"}, values=[1.0, 2.0, 3.0, 4.0, 5.0]) index.fetch(ids=['Bangalore’]) {'namespace': '', 'vectors': {'Bangalore': {'id': 'Bangalore', 'metadata': {'sub-type': 'non-metro', 'type': 'city'}, 'values': [1.0, 2.0, 3.0, 4.0, 5.0]}}}
  • 22.
    Query data • Querydata by vector match index.query(vector=[1.0, 2.0, 3.0, 5, 7.0], top_k=3, include_values=True) {'matches': [{'id': 'Bangalore', 'score': 0.999296784, 'values': [1.0, 2.0, 3.0, 5.0, 7.0]}, {'id': 'Delhi', 'score': 0.997676671, 'values': [1.0, 2.234, 3.34, 5.6, 7.8]}, {'id': 'Den Haag', 'score': 0.997676671, 'values': [1.0, 2.234, 3.34, 5.6, 7.8]}], 'namespace': ‘’} • Apart from id and values (vector data), metadata of each matched record also could be retrieved index.query(vector=[1.0, 2.0, 3.0, 5, 7.0], top_k=3, include_values=True, include_metadata=True)
  • 23.
    Namespace • Pinecone allowsyou to partition the records in an index into namespaces. Queries and other operations are then limited to one namespace, so different requests can search different subsets of your index. • A new namespace could be created by inserting a record to an index by specifying a new namespace - index.upsert( vectors = [ ("Howrah", [1.0, 2.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type": "metro"}), ("Siliguri", [2.0, 1.234, 3.34, 5.6, 7.8], {"type": "city", "sub-type": "non-metro"}) ], namespace='my-first-namespace') • By default each index contains a single default index.