SlideShare a Scribd company logo
Search Engine Back End API
Solution for Fast Prototyping
https://github.com/jimmylai/
search_engine_backend_template

Introduction and Tutorial
Jimmy Lai
r97922028 [at] ntu.edu.tw
http://tw.linkedin.com/pub/jimmy-lai/27/4a/536
2013/01/23
Requirements:
•  A Backend API provides:
1. 
2. 
3. 
4. 
5. 

Full-text search
Spatial search
RESTful API
Input validation
Output rendering

•  Simple and easy for fast prototyping

LDSP

2
LDSP:
Linux + Django + Solr + Python
•  Django: API serving
–  Input validation
–  Data mash-up
–  Output rendering

•  Solr: Indexing and searching
–  Schema
–  Indexing
–  Query
LDSP

3
Automation:
Shell command + Fabric
•  Installation:
–  Solr
–  Python packages: django, djangorestframework,
pysolr

•  Start services:
–  Solr
–  Django standalone server

•  Data:
–  Create Solr core
–  Feed data
LDSP

4
[Tutorial] Requirement
•  Example Dataset: Taiwan movie dataset
http://data.gov.tw/node/7731

•  Data source
http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx?
type=26&radius=100&format=json

•  We’d like to build a geo search API to
find nearby movie items given a
location

LDSP

5
[Tutorial] Data preprocessing (1/2)
•  Original data
•  {"sno":"29277","mainTitle":"「新南光」電影
《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的
外型遊台灣","year":"未
知","format":"508x640pixels","subject":"戲劇
電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國
時期","sourcesite":"http://nrch.cca.gov.tw/
ccahome/photo/photo_meta.jsp?
xml_id=0005780574&dofile=cca100009-hptmtp0281_00-0001w.jpg","px":"120.49","py":"23.853"}
LDSP

6
[Tutorial] Data preprocessing (2/2)
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 

Preprocessed data: normalized the location format
The preprocessed data is stored as data/movie.json
{
"format": "508x640pixels",
"sno": "29277",
"sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp?
xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg",
"mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的
外型遊台灣",
"location": "23.853,120.49",
"year": "未知",
"date": "創作⽇日期:⺠民國時期⺠民國時期",
"id": "29277",
"subject": "戲劇電影歌仔戲"
},

LDSP

7
[Tutorial] Data indexing
• 
• 
• 
• 
• 

pip install fabric
fab setup_env
fab create_core:movie
fab run_solr
fab feed_data:movie,data/movie.json
–  Core name as the 1st parameter, data file name
as the 2nd parameter

•  fab run_django
•  We are done!
LDSP

8
[Tutorial] Solr Admin

LDSP

9
[Tutorial] Djangorestframework UI

LDSP

10
[Tutorial] BE API Query, json output

LDSP

11
[Tutorial] BE API Query, xml output

LDSP

12
The next steps
•  Understand the mechanism of:
–  Solr:
•  Update schema
•  Update partial data
•  Query language of Solr

–  Djangorestframework:
•  Add more parameters
•  Input validation
•  Output rendering

•  Deploy to production environment
LDSP

13

More Related Content

What's hot

Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introduction
Ageeleshwar K
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Big Data Spain
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And Groovy
Ken Kousen
 
Kube-AWS
Kube-AWSKube-AWS
Kube-AWS
CoreOS
 
Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018
Mathieu Herbert
 
CoreOS in a Nutshell
CoreOS in a NutshellCoreOS in a Nutshell
CoreOS in a Nutshell
CoreOS
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
Sadayuki Furuhashi
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabricandymccurdy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Sematext Group, Inc.
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...
Shih Oon Liong
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
Teguh Nugraha
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
m_richardson
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
Yevgeniy Brikman
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
YoungHeon (Roy) Kim
 
Altitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopAltitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshop
Fastly
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
Sadayuki Furuhashi
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Fastly
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
Sadayuki Furuhashi
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
NAVER D2
 

What's hot (20)

Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introduction
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And Groovy
 
Kube-AWS
Kube-AWSKube-AWS
Kube-AWS
 
Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018
 
CoreOS in a Nutshell
CoreOS in a NutshellCoreOS in a Nutshell
CoreOS in a Nutshell
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
Altitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopAltitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshop
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 

Viewers also liked

Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
Jimmy Lai
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
Jimmy Lai
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst NanodegreeJimmy Lai
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Jimmy Lai
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
Jimmy Lai
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
Jimmy Lai
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
Jimmy Lai
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
Jimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
Jimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
Jimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugJimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
Jimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
 

Viewers also liked (15)

Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst Nanodegree
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 

Similar to [LDSP] Search Engine Back End API Solution for Fast Prototyping

rest3d Web3D 2014
rest3d Web3D 2014rest3d Web3D 2014
rest3d Web3D 2014
Remi Arnaud
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
Jungsu Heo
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
SmartBear
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
Barcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkBarcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot Framework
Somkiat Puisungnoen
 
Angular2 inter3
Angular2 inter3Angular2 inter3
Angular2 inter3
Oswald Campesato
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
SATOSHI TAGOMORI
 
Rapid application development with spring roo j-fall 2010 - baris dere
Rapid application development with spring roo   j-fall 2010 - baris dereRapid application development with spring roo   j-fall 2010 - baris dere
Rapid application development with spring roo j-fall 2010 - baris dereBaris Dere
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPC
disc99_
 
release_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfrelease_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdf
Paul Yang
 
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk""Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
Rinaldi Rampen
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!
Jaran Flaath
 
Apex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXApex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DX
tzm_freedom
 
JRubyConf 2009
JRubyConf 2009JRubyConf 2009
JRubyConf 2009
John Woodell
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open PlatformHideo Kimura
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
LINE Corporation
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMC
Icinga
 
Planidoo & Zotonic
Planidoo & ZotonicPlanidoo & Zotonic
Planidoo & Zotonic
David de Boer
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
Lukas Eppler
 
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
SPS Paris
 

Similar to [LDSP] Search Engine Back End API Solution for Fast Prototyping (20)

rest3d Web3D 2014
rest3d Web3D 2014rest3d Web3D 2014
rest3d Web3D 2014
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Barcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkBarcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot Framework
 
Angular2 inter3
Angular2 inter3Angular2 inter3
Angular2 inter3
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
 
Rapid application development with spring roo j-fall 2010 - baris dere
Rapid application development with spring roo   j-fall 2010 - baris dereRapid application development with spring roo   j-fall 2010 - baris dere
Rapid application development with spring roo j-fall 2010 - baris dere
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPC
 
release_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfrelease_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdf
 
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk""Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!
 
Apex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXApex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DX
 
JRubyConf 2009
JRubyConf 2009JRubyConf 2009
JRubyConf 2009
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open Platform
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMC
 
Planidoo & Zotonic
Planidoo & ZotonicPlanidoo & Zotonic
Planidoo & Zotonic
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
 
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

[LDSP] Search Engine Back End API Solution for Fast Prototyping

  • 1. Search Engine Back End API Solution for Fast Prototyping https://github.com/jimmylai/ search_engine_backend_template Introduction and Tutorial Jimmy Lai r97922028 [at] ntu.edu.tw http://tw.linkedin.com/pub/jimmy-lai/27/4a/536 2013/01/23
  • 2. Requirements: •  A Backend API provides: 1.  2.  3.  4.  5.  Full-text search Spatial search RESTful API Input validation Output rendering •  Simple and easy for fast prototyping LDSP 2
  • 3. LDSP: Linux + Django + Solr + Python •  Django: API serving –  Input validation –  Data mash-up –  Output rendering •  Solr: Indexing and searching –  Schema –  Indexing –  Query LDSP 3
  • 4. Automation: Shell command + Fabric •  Installation: –  Solr –  Python packages: django, djangorestframework, pysolr •  Start services: –  Solr –  Django standalone server •  Data: –  Create Solr core –  Feed data LDSP 4
  • 5. [Tutorial] Requirement •  Example Dataset: Taiwan movie dataset http://data.gov.tw/node/7731 •  Data source http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx? type=26&radius=100&format=json •  We’d like to build a geo search API to find nearby movie items given a location LDSP 5
  • 6. [Tutorial] Data preprocessing (1/2) •  Original data •  {"sno":"29277","mainTitle":"「新南光」電影 《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的 外型遊台灣","year":"未 知","format":"508x640pixels","subject":"戲劇 電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國 時期","sourcesite":"http://nrch.cca.gov.tw/ ccahome/photo/photo_meta.jsp? xml_id=0005780574&dofile=cca100009-hptmtp0281_00-0001w.jpg","px":"120.49","py":"23.853"} LDSP 6
  • 7. [Tutorial] Data preprocessing (2/2) •  •  •  •  •  •  •  •  •  •  •  •  •  Preprocessed data: normalized the location format The preprocessed data is stored as data/movie.json { "format": "508x640pixels", "sno": "29277", "sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp? xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg", "mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的 外型遊台灣", "location": "23.853,120.49", "year": "未知", "date": "創作⽇日期:⺠民國時期⺠民國時期", "id": "29277", "subject": "戲劇電影歌仔戲" }, LDSP 7
  • 8. [Tutorial] Data indexing •  •  •  •  •  pip install fabric fab setup_env fab create_core:movie fab run_solr fab feed_data:movie,data/movie.json –  Core name as the 1st parameter, data file name as the 2nd parameter •  fab run_django •  We are done! LDSP 8
  • 11. [Tutorial] BE API Query, json output LDSP 11
  • 12. [Tutorial] BE API Query, xml output LDSP 12
  • 13. The next steps •  Understand the mechanism of: –  Solr: •  Update schema •  Update partial data •  Query language of Solr –  Djangorestframework: •  Add more parameters •  Input validation •  Output rendering •  Deploy to production environment LDSP 13