Python is a highly adopted language for data science and analysis. Eland is a Python client and toolkit for DataFrames, big data, machine learning, and ETL in Elasticsearch. Get an introduction to Eland with a hands-on demo where you’ll learn about the DataFrame implementation of Eland, as well as how to manage machine learning models.
Eland: A Python client for data analysis and exploration
1. 1
Bridge the Gap: Python
Data Science and
Elasticsearch
Seth Michael Larson
Software Engineer, Clients
2. 2
This presentation and the accompanying oral presentation contain forward-looking statements, including statements
concerning plans for future offerings; the expected strength, performance or benefits of our offerings; and our future
operations and expected performance. These forward-looking statements are subject to the safe harbor provisions
under the Private Securities Litigation Reform Act of 1995. Our expectations and beliefs in light of currently
available information regarding these matters may not materialize. Actual outcomes and results may differ materially
from those contemplated by these forward-looking statements due to uncertainties, risks, and changes in
circumstances, including, but not limited to those related to: the impact of the COVID-19 pandemic on our business
and our customers and partners; our ability to continue to deliver and improve our offerings and successfully
develop new offerings, including security-related product offerings and SaaS offerings; customer acceptance and
purchase of our existing offerings and new offerings, including the expansion and adoption of our SaaS offerings;
our ability to realize value from investments in the business, including R&D investments; our ability to maintain and
expand our user and customer base; our international expansion strategy; our ability to successfully execute our
go-to-market strategy and expand in our existing markets and into new markets, and our ability to forecast customer
retention and expansion; and general market, political, economic and business conditions.
Additional risks and uncertainties that could cause actual outcomes and results to differ materially are included in
our filings with the Securities and Exchange Commission (the “SEC”), including our Annual Report on Form 10-K for
the most recent fiscal year, our quarterly report on Form 10-Q for the most recent fiscal quarter, and any
subsequent reports filed with the SEC. SEC filings are available on the Investor Relations section of Elastic’s
website at ir.elastic.co and the SEC’s website at www.sec.gov.
Any features or functions of services or products referenced in this presentation, or in any presentations, press
releases or public statements, which are not currently available or not currently available as a general availability
release, may not be delivered on time or at all. The development, release, and timing of any features or functionality
described for our products remains at our sole discretion. Customers who purchase our products and services
should make the purchase decisions based upon services and product features and functions that are currently
available.
All statements are made only as of the date of the presentation, and Elastic assumes no obligation to, and does not
currently intend to, update any forward-looking statements or statements relating to features or functions of services
or products, except as required by law.
Forward-Looking Statements
3. What Gap are we Bridging?
Data Science
Pandas
Numpy
Jupyter Notebook
Scikit-Learn
XGBoost
LightGBM
Elastic Stack
Elasticsearch
Elastic ML
Horizontal Scaling
Dist. Computing
4. What is Eland?
• Free and Open Source Python library
• Available on PyPI and Conda Forge
• Supports Python 3.6 and Elasticsearch 7
• Currently Beta, working towards GA
5. What is Eland?
Data Science
Pandas
Numpy
Jupyter Notebook
Scikit-Learn
XGBoost
LightGBM
Elastic Stack
Elasticsearch
Elastic ML
Horizontal Scaling
Dist. Computing
6. Data Frames
Machine
Learning
• Pandas compatible API
• Explore data in Jupyter
Notebooks
• Scale storage and
compute with your cluster
• Build machine learning
models locally with Python
• Execute and scale models
in Elasticsearch
What can Eland do?
7. • https://bit.ly/eland-demo
• Jupyter Notebook with
everything pre-configured
• Connected to a read-only
cluster on Elastic Cloud
Follow along in your Browser!
9. _id DestCityName AvgTicketPrice DestLocation
0 Sydney 841.265 [151.17, -33.94]
1 Venice 882.982 [12.35, 45.50]
... ... ... ...
N-1 Treviso 181.694 [12.19, 45.64]
N Xi'an 730.041 [108.75, 34.44]
dtype object/keyword float64/double object/geo_point
Data Frame
Row
Column
Dtype
Index
What is a Data Frame?
10. _id DestCityName AvgTicketPrice DestLocation
0 Sydney 841.265 [151.17, -33.94]
1 Venice 882.982 [12.35, 45.50]
... ... ... ...
N-1 Treviso 181.694 [12.19, 45.64]
N Xi'an 730.041 [108.75, 34.44]
dtype object/keyword float64/double object/geo_point
Data Frame -> Indices
Row
Column
Dtype
Index
Mapping Data Frame Concepts to Elasticsearch
11. _id DestCityName AvgTicketPrice DestLocation
0 Sydney 841.265 [151.17, -33.94]
1 Venice 882.982 [12.35, 45.50]
... ... ... ...
N-1 Treviso 181.694 [12.19, 45.64]
N Xi'an 730.041 [108.75, 34.44]
dtype object/keyword float64/double object/geo_point
Data Frame -> Indices
Row -> Document
Column
Dtype
Index
Mapping Data Frame Concepts to Elasticsearch
12. _id DestCityName AvgTicketPrice DestLocation
0 Sydney 841.265 [151.17, -33.94]
1 Venice 882.982 [12.35, 45.50]
... ... ... ...
N-1 Treviso 181.694 [12.19, 45.64]
N Xi'an 730.041 [108.75, 34.44]
dtype object/keyword float64/double object/geo_point
Data Frame -> Indices
Row -> Document
Column -> Field
Dtype
Index
Mapping Data Frame Concepts to Elasticsearch
13. _id DestCityName AvgTicketPrice DestLocation
0 Sydney 841.265 [151.17, -33.94]
1 Venice 882.982 [12.35, 45.50]
... ... ... ...
N-1 Treviso 181.694 [12.19, 45.64]
N Xi'an 730.041 [108.75, 34.44]
dtype object/keyword float64/double object/geo_point
Data Frame -> Indices
Row -> Document
Column -> Field
Dtype -> Field Type
Index
Mapping Data Frame Concepts to Elasticsearch
14. _id DestCityName AvgTicketPrice DestLocation
0 Sydney 841.265 [151.17, -33.94]
1 Venice 882.982 [12.35, 45.50]
... ... ... ...
N-1 Treviso 181.694 [12.19, 45.64]
N Xi'an 730.041 [108.75, 34.44]
dtype object/keyword float64/double object/geo_point
Data Frame -> Indices
Row -> Document
Column -> Field
Dtype -> Field Type
Index -> Doc ID
Index → Sort Order
Mapping Data Frame Concepts to Elasticsearch
15. Construct query or
aggregation with
Eland APIs
Elasticsearch
cluster executes
the computation
Results are
serialized into
Pandas
Distributed Scalable Data Processing
17. “Businesses within 1km with
the word ‘red’ in the name”
df.es_query({
"bool": {
"must": {
"match": {"name": "red"}
},
"filter": {
"geo_distance": {
"distance": "1km",
"location": {
"lat": 40,
"lon": -70
}
}
}
}
})
Data Frames with
Superpowers
Powered by Elasticsearch
Using the es_query() method you can filter
and transform data using Query DSL
Access your Data where it Lives
Query and aggregate Logs and Event data
directly from Elasticsearch without
time-consuming exports
Storage Scales with your Cluster
Let Elasticsearch handle the heavy lifting
and only process query results in memory
18. Explore Data in
Elasticsearch
Pandas Workflow for Exploring
Data
Familiar APIs for exploring your dataset like
info(), describe(), and more.
Integrates with Jupyter Notebook
Your data looks and feels local at every
step with rich output in Jupyter Notebooks
Export to Pandas Anytime
Bring your data local with to_pandas() to
access entire Pandas API
20. ### Train a scikit-learn model
sk_model = DecisionTreeClassifier()
sk_model.train(X_train, y_train)
# TODO: ...? :(
Trained Model to
...what now?
Deploying Trained Models is Tough
• How will I get data into my model?
• Do I need to build a web service?
• Where will it be hosted?
• How can I scale my model?
21. ### Train a scikit-learn model
sk_model = DecisionTreeClassifier()
sk_model.train(X_train, y_train)
### Import model into Elasticsearch
from eland.ml import MLModel
es_model = MLModel.import_model(
es_client = "localhost:9200",
model = sk_model,
model_id = "example-model-id",
feature_names = data.feature_names
)
### Execute a prediction
es_model.predict(X_test) == y_test
[True, True, True]
Trained Model to
Production!
Operationalize Models with Elastic ML
Build models on your laptop. Eland handles
deployment to Elasticsearch via import_model()
Execute Model from Jupyter Notebook
Execute Elastic ML models on individual datasets
with a familiar predict() API
Execute Model on Ingest
Run ML inference on incoming documents as a
part of an Ingest Pipeline Inference Processor
22. Train model locally
with your preferred
ML library
Transform model
into Elastic ML
format
Trained model is
deployed in
Elasticsearch
Keep your Training Workflow
23. Eland means Elastic and Data
Pandas
Numpy
Jupyter Notebook
Scikit-Learn
XGBoost
LightGBM
♥+