Solr Extracting Data
● Start this session with a full Solr indexed repository
– Movie cAiYBD4BQeE showed installation
– Mo...
Solr Extracting Data
● Progress so far, greyed out area yet to be examined
Checking Solr Data
● Data should have been indexed in Solr
● In Solr Admin window
– Set 'Core Selector' = collection1
– Cl...
Checking Solr Data
How To Extract
● How could we get at Solr data ?
– In admin console via query
– Via http solr select
– Via curl -o call us...
How To Extract
● We want to extract two columns from Solr
– tstamp, url
● We want to extract as csv ( csv in call below co...
How To Extract
● Ceate a bash file in Solr install directory
– cd solr-4-2-1/extract ; touch solr_url_extract.bash
– chmod...
Check Output
● Now we check whether we have data
● ls -l shows
– result.csv.20130506.124857
● Check the content , wc -l sh...
Possible Next Steps
● Choose more fields to extract from data
● Allow Nutch crawl to go deeper
● Allow Nutch crawl to coll...
Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project...
Upcoming SlideShare
Loading in …5
×

Solr Extracting Data

1,806 views
1,531 views

Published on

A presentation showing how to extract data from the solr tool, it is part 3 of a three part series. Originally from my youtube channel.

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,806
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Solr Extracting Data

  1. 1. Solr Extracting Data ● Start this session with a full Solr indexed repository – Movie cAiYBD4BQeE showed installation – Movie Th5Scvlyt-E showed Nutch web crawl ● This movie will show how to – Extract data from Solr – Extract to xml or csv – Show aim to load into data warehouse ● This movie assumes you know Linux
  2. 2. Solr Extracting Data ● Progress so far, greyed out area yet to be examined
  3. 3. Checking Solr Data ● Data should have been indexed in Solr ● In Solr Admin window – Set 'Core Selector' = collection1 – Click 'Query' – In Query window set fl field = url – Click Execute Query ● The result ( next ) shows the filtered list of urls in Solr
  4. 4. Checking Solr Data
  5. 5. How To Extract ● How could we get at Solr data ? – In admin console via query – Via http solr select – Via curl -o call using solr http select ● What format of data – that suits this purpose – Xml – Comma separated variable (csv)
  6. 6. How To Extract ● We want to extract two columns from Solr – tstamp, url ● We want to extract as csv ( csv in call below could be xml ) ● We want to extract to a file ● So we will use an http call – http://localhost:8983/solr/select?q=*:*&fl=tstamp,url&wt=csv ● We will also use a curl call – curl -o <csv file> '<http call>'
  7. 7. How To Extract ● Ceate a bash file in Solr install directory – cd solr-4-2-1/extract ; touch solr_url_extract.bash – chmod 755 solr_url_extract.bash ● Add contents to bash file – #!/bin/bash – curl -o result.csv 'http://localhost:8983/solr/select?q=*:*&fl=tstamp,url&wt=csv' – mv result.csv result.csv.$(date +”%Y%m%d.%H%M%S”) ● Now run the bash script – ./solr_url_extract.bash
  8. 8. Check Output ● Now we check whether we have data ● ls -l shows – result.csv.20130506.124857 ● Check the content , wc -l shows 11 lines ● Check the content , head -2 shows – tstamp, url – 2013-05-04T01:56:58.157Z,http://www.mysite.co.nz/Search? DateRange=7& ... ● Congratulations, you have extracted data from Solr ● It's in CSV format ready to be loaded into a data warehouse
  9. 9. Possible Next Steps ● Choose more fields to extract from data ● Allow Nutch crawl to go deeper ● Allow Nutch crawl to collect a lot more data ● Look at facets in Solr data ● Load CSV files into Data Warehouse Staging schema ● Next movie will show next step in progress
  10. 10. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

×