• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Solr Extracting Data
 

Solr Extracting Data

on

  • 850 views

A presentation showing how to extract data from the solr tool, it is part 3 of a three part series. Originally from my youtube channel.

A presentation showing how to extract data from the solr tool, it is part 3 of a three part series. Originally from my youtube channel.

Statistics

Views

Total Views
850
Views on SlideShare
850
Embed Views
0

Actions

Likes
1
Downloads
12
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Solr Extracting Data Solr Extracting Data Presentation Transcript

    • Solr Extracting Data ● Start this session with a full Solr indexed repository – Movie cAiYBD4BQeE showed installation – Movie Th5Scvlyt-E showed Nutch web crawl ● This movie will show how to – Extract data from Solr – Extract to xml or csv – Show aim to load into data warehouse ● This movie assumes you know Linux
    • Solr Extracting Data ● Progress so far, greyed out area yet to be examined
    • Checking Solr Data ● Data should have been indexed in Solr ● In Solr Admin window – Set 'Core Selector' = collection1 – Click 'Query' – In Query window set fl field = url – Click Execute Query ● The result ( next ) shows the filtered list of urls in Solr
    • Checking Solr Data
    • How To Extract ● How could we get at Solr data ? – In admin console via query – Via http solr select – Via curl -o call using solr http select ● What format of data – that suits this purpose – Xml – Comma separated variable (csv)
    • How To Extract ● We want to extract two columns from Solr – tstamp, url ● We want to extract as csv ( csv in call below could be xml ) ● We want to extract to a file ● So we will use an http call – http://localhost:8983/solr/select?q=*:*&fl=tstamp,url&wt=csv ● We will also use a curl call – curl -o <csv file> '<http call>'
    • How To Extract ● Ceate a bash file in Solr install directory – cd solr-4-2-1/extract ; touch solr_url_extract.bash – chmod 755 solr_url_extract.bash ● Add contents to bash file – #!/bin/bash – curl -o result.csv 'http://localhost:8983/solr/select?q=*:*&fl=tstamp,url&wt=csv' – mv result.csv result.csv.$(date +”%Y%m%d.%H%M%S”) ● Now run the bash script – ./solr_url_extract.bash
    • Check Output ● Now we check whether we have data ● ls -l shows – result.csv.20130506.124857 ● Check the content , wc -l shows 11 lines ● Check the content , head -2 shows – tstamp, url – 2013-05-04T01:56:58.157Z,http://www.mysite.co.nz/Search? DateRange=7& ... ● Congratulations, you have extracted data from Solr ● It's in CSV format ready to be loaded into a data warehouse
    • Possible Next Steps ● Choose more fields to extract from data ● Allow Nutch crawl to go deeper ● Allow Nutch crawl to collect a lot more data ● Look at facets in Solr data ● Load CSV files into Data Warehouse Staging schema ● Next movie will show next step in progress
    • Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems