SlideShare a Scribd company logo
1 of 13
Query breakdown 
Peng Cheng
IMDB
http://www.imdb.com/chart 
(sc.parallelize(Seq(null)) +> 
Wget("http://www.imdb.com/chart") !==) 
.joinBySlice("div#boxoffice tbody tr") 
.selectInto( 
"rank" -> (_.ownText1("tr 
td.titleColumn").replaceAll(""","").trim), 
"name" -> (_.text1("tr td.titleColumn a")), 
"year" -> (_.text1("tr td.titleColumn span")), 
"box_weekend" -> (_.text("tr td.ratingColumn")(0)), 
"box_gross" -> (_.text("td.ratingColumn")(1)), 
"weeks" -> (_.text1("tr td.weeksColumn")) 
) 
.wgetJoin("tr td.titleColumn a")
http://www.imdb.com/title/tt2015381/ 
?ref_=cht_bo_1 
.selectInto( 
"score" -> (_.text1("td#overview-top 
div.titlePageSprite")), 
"rating_count" -> (_.text1("td#overview-top 
span[itemprop=ratingCount]")), 
"review_count" -> (_.text1("td#overview-top 
span[itemprop=reviewCount]")) 
) 
.wgetLeftJoin("div#maindetails_quicklinks 
a:contains(Reviews)")
http://www.imdb.com/title/tt2015381/r 
eviews?ref_=tt_ql_8 
.wgetInsertPagination("div#tn15content 
a:has(img[alt~=Next])",500) 
.joinBySlice("div#tn15content div:has(h2)") 
.selectInto( 
"review_rating" -> (_.attr1("img[alt]","alt")), 
"review_title" -> (_.text1("h2")), 
"review_meta" -> (_.text("small").toString()) 
) 
.wgetLeftJoin("a")
http://www.imdb.com/user/ur2358212 
1/ 
.selectInto( 
"user_name" -> (_.text1("div.user-profile h1")), 
"user_timestamp" -> (_.text1("div.user-profile 
div.timestamp")), 
"user_post_count" -> (_.ownText1("div.user-lists div.see-more")), 
"user_rating_count" -> (_.text1("div.ratings div.see-more")), 
"user_review_count" -> (_.text1("div.reviews div.see-more")), 
"user_rating_histogram" -> (_.attr("div.overall div.histogram-horizontal 
a","title").toString()) 
) 
.asTsvRDD() //Output as TSV file 
.collect()
How to test 
1. Go to: http://ec2-54-88-40- 
125.compute- 
1.amazonaws.com:8888/notebooks 
/all_inclusive_demo.ipynb# in your 
browser. 
2. Find IMDB review extraction 
3. Execute! And wait to see the 
results. 
4. Go to: http://ec2-54-88-40- 
125.compute- 
1.amazonaws.com:4040/stages/ to 
see your progress
rottentomatoes
http://www.rottentomatoes.com/ 
Wget("http://www.rottentomatoes.com/") !==) 
.wgetJoin("table.top_box_office 
tr.sidebarInTheaterTopBoxOffice a", indexKey = "rank")
http://www.rottentomatoes.com/m/gua 
rdians_of_the_galaxy/ 
.selectInto( 
"name" -> (_.text1("h1.movie_title")), 
"meter" -> (_.text1("div#all-critics-numbers span#all-critics- 
meter")), 
"rating" -> (_.text1("div#all-critics-numbers p.critic_stats 
span")), 
"review_count" -> (_.text1("div#all-critics-numbers 
p.critic_stats span[itemprop=reviewCount]")) 
) 
.wgetJoin("div#contentReviews h3 a") 
`
http://www.rottentomatoes.com/m/gua 
rdians_of_the_galaxy/reviews/ 
.wgetInsertPagination("div.scroller a.right", indexKey = 
"page") // grab all pages by using right arrow button 
.joinBySlice("div#reviews div.media_block") //slice into 
review blocks 
.selectInto( 
"critic_name" -> (_.text1("div.criticinfo strong a")), 
"critic_org" -> (_.text1("div.criticinfo em.subtle")), 
"critic_review" -> (_.text1("div.reviewsnippet p")), 
"critic_score" -> (_.ownText1("div.reviewsnippet 
p.subtle")) 
) 
.wgetJoin("div.criticinfo strong a")
http://www.rottentomatoes.com/critic/s 
ean-means/ 
.selectInto( 
"total_reviews_ratings" -> (_.text("div.media_block 
div.clearfix dd").toString()) 
) 
.asJsonRDD() 
.collect()
How to test 
1. Go to: http://ec2-54-88-40- 
125.compute- 
1.amazonaws.com:8888/notebooks 
/all_inclusive_demo.ipynb# in your 
browser. 
2. Find Rotten Tomatoes Review 
Extraction 
3. Execute! And wait to see the 
results. 
4. Go to: http://ec2-54-88-40- 
125.compute- 
1.amazonaws.com:4040/stages/ to 
see your progress

More Related Content

Similar to Query breakdown

Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 
jQuery Foot-Gun Features
jQuery Foot-Gun FeaturesjQuery Foot-Gun Features
jQuery Foot-Gun Featuresdmethvin
 
Web весна 2013 лекция 6
Web весна 2013 лекция 6Web весна 2013 лекция 6
Web весна 2013 лекция 6Technopark
 
Web осень 2012 лекция 6
Web осень 2012 лекция 6Web осень 2012 лекция 6
Web осень 2012 лекция 6Technopark
 
The rise and fall of a techno DJ, plus more new reviews and notable screenings
The rise and fall of a techno DJ, plus more new reviews and notable screeningsThe rise and fall of a techno DJ, plus more new reviews and notable screenings
The rise and fall of a techno DJ, plus more new reviews and notable screeningschicagonewsyesterday
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
 
Dirty Durham: Dry cleaning solvents leaked into part of Trinity Park | News
Dirty Durham: Dry cleaning solvents leaked into part of Trinity Park | NewsDirty Durham: Dry cleaning solvents leaked into part of Trinity Park | News
Dirty Durham: Dry cleaning solvents leaked into part of Trinity Park | Newsdizzyspiral5631
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196Mahmoud Samir Fayed
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Chris Alfano
 
R (Shiny Package) - Server Side Code for Decision Support System
R (Shiny Package) - Server Side Code for Decision Support SystemR (Shiny Package) - Server Side Code for Decision Support System
R (Shiny Package) - Server Side Code for Decision Support SystemMaithreya Chakravarthula
 
Produce nice outputs for graphical, tabular and textual reporting in R-Report...
Produce nice outputs for graphical, tabular and textual reporting in R-Report...Produce nice outputs for graphical, tabular and textual reporting in R-Report...
Produce nice outputs for graphical, tabular and textual reporting in R-Report...Dr. Volkan OBAN
 
Practical Experience Building JavaFX Rich Clients
Practical Experience Building JavaFX Rich ClientsPractical Experience Building JavaFX Rich Clients
Practical Experience Building JavaFX Rich ClientsRichard Bair
 
Writing DSLs with Parslet - Wicked Good Ruby Conf
Writing DSLs with Parslet - Wicked Good Ruby ConfWriting DSLs with Parslet - Wicked Good Ruby Conf
Writing DSLs with Parslet - Wicked Good Ruby ConfJason Garber
 
Vaadin Components @ Angular U
Vaadin Components @ Angular UVaadin Components @ Angular U
Vaadin Components @ Angular UJoonas Lehtinen
 
Jquery presentation
Jquery presentationJquery presentation
Jquery presentationguest5d87aa6
 
Юрий Буянов «Squeryl — ORM с человеческим лицом»
Юрий Буянов «Squeryl — ORM с человеческим лицом»Юрий Буянов «Squeryl — ORM с человеческим лицом»
Юрий Буянов «Squeryl — ORM с человеческим лицом»e-Legion
 
The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210Mahmoud Samir Fayed
 
jQuery Rescue Adventure
jQuery Rescue AdventurejQuery Rescue Adventure
jQuery Rescue AdventureAllegient
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 

Similar to Query breakdown (20)

Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 
jQuery Foot-Gun Features
jQuery Foot-Gun FeaturesjQuery Foot-Gun Features
jQuery Foot-Gun Features
 
Web весна 2013 лекция 6
Web весна 2013 лекция 6Web весна 2013 лекция 6
Web весна 2013 лекция 6
 
Web осень 2012 лекция 6
Web осень 2012 лекция 6Web осень 2012 лекция 6
Web осень 2012 лекция 6
 
The rise and fall of a techno DJ, plus more new reviews and notable screenings
The rise and fall of a techno DJ, plus more new reviews and notable screeningsThe rise and fall of a techno DJ, plus more new reviews and notable screenings
The rise and fall of a techno DJ, plus more new reviews and notable screenings
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
 
Dirty Durham: Dry cleaning solvents leaked into part of Trinity Park | News
Dirty Durham: Dry cleaning solvents leaked into part of Trinity Park | NewsDirty Durham: Dry cleaning solvents leaked into part of Trinity Park | News
Dirty Durham: Dry cleaning solvents leaked into part of Trinity Park | News
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011
 
R (Shiny Package) - Server Side Code for Decision Support System
R (Shiny Package) - Server Side Code for Decision Support SystemR (Shiny Package) - Server Side Code for Decision Support System
R (Shiny Package) - Server Side Code for Decision Support System
 
Produce nice outputs for graphical, tabular and textual reporting in R-Report...
Produce nice outputs for graphical, tabular and textual reporting in R-Report...Produce nice outputs for graphical, tabular and textual reporting in R-Report...
Produce nice outputs for graphical, tabular and textual reporting in R-Report...
 
Practical Experience Building JavaFX Rich Clients
Practical Experience Building JavaFX Rich ClientsPractical Experience Building JavaFX Rich Clients
Practical Experience Building JavaFX Rich Clients
 
Writing DSLs with Parslet - Wicked Good Ruby Conf
Writing DSLs with Parslet - Wicked Good Ruby ConfWriting DSLs with Parslet - Wicked Good Ruby Conf
Writing DSLs with Parslet - Wicked Good Ruby Conf
 
Vaadin Components @ Angular U
Vaadin Components @ Angular UVaadin Components @ Angular U
Vaadin Components @ Angular U
 
Jquery presentation
Jquery presentationJquery presentation
Jquery presentation
 
Юрий Буянов «Squeryl — ORM с человеческим лицом»
Юрий Буянов «Squeryl — ORM с человеческим лицом»Юрий Буянов «Squeryl — ORM с человеческим лицом»
Юрий Буянов «Squeryl — ORM с человеческим лицом»
 
The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210
 
Miracle of std lib
Miracle of std libMiracle of std lib
Miracle of std lib
 
jQuery Rescue Adventure
jQuery Rescue AdventurejQuery Rescue Adventure
jQuery Rescue Adventure
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Query breakdown

  • 3. http://www.imdb.com/chart (sc.parallelize(Seq(null)) +> Wget("http://www.imdb.com/chart") !==) .joinBySlice("div#boxoffice tbody tr") .selectInto( "rank" -> (_.ownText1("tr td.titleColumn").replaceAll(""","").trim), "name" -> (_.text1("tr td.titleColumn a")), "year" -> (_.text1("tr td.titleColumn span")), "box_weekend" -> (_.text("tr td.ratingColumn")(0)), "box_gross" -> (_.text("td.ratingColumn")(1)), "weeks" -> (_.text1("tr td.weeksColumn")) ) .wgetJoin("tr td.titleColumn a")
  • 4. http://www.imdb.com/title/tt2015381/ ?ref_=cht_bo_1 .selectInto( "score" -> (_.text1("td#overview-top div.titlePageSprite")), "rating_count" -> (_.text1("td#overview-top span[itemprop=ratingCount]")), "review_count" -> (_.text1("td#overview-top span[itemprop=reviewCount]")) ) .wgetLeftJoin("div#maindetails_quicklinks a:contains(Reviews)")
  • 5. http://www.imdb.com/title/tt2015381/r eviews?ref_=tt_ql_8 .wgetInsertPagination("div#tn15content a:has(img[alt~=Next])",500) .joinBySlice("div#tn15content div:has(h2)") .selectInto( "review_rating" -> (_.attr1("img[alt]","alt")), "review_title" -> (_.text1("h2")), "review_meta" -> (_.text("small").toString()) ) .wgetLeftJoin("a")
  • 6. http://www.imdb.com/user/ur2358212 1/ .selectInto( "user_name" -> (_.text1("div.user-profile h1")), "user_timestamp" -> (_.text1("div.user-profile div.timestamp")), "user_post_count" -> (_.ownText1("div.user-lists div.see-more")), "user_rating_count" -> (_.text1("div.ratings div.see-more")), "user_review_count" -> (_.text1("div.reviews div.see-more")), "user_rating_histogram" -> (_.attr("div.overall div.histogram-horizontal a","title").toString()) ) .asTsvRDD() //Output as TSV file .collect()
  • 7. How to test 1. Go to: http://ec2-54-88-40- 125.compute- 1.amazonaws.com:8888/notebooks /all_inclusive_demo.ipynb# in your browser. 2. Find IMDB review extraction 3. Execute! And wait to see the results. 4. Go to: http://ec2-54-88-40- 125.compute- 1.amazonaws.com:4040/stages/ to see your progress
  • 9. http://www.rottentomatoes.com/ Wget("http://www.rottentomatoes.com/") !==) .wgetJoin("table.top_box_office tr.sidebarInTheaterTopBoxOffice a", indexKey = "rank")
  • 10. http://www.rottentomatoes.com/m/gua rdians_of_the_galaxy/ .selectInto( "name" -> (_.text1("h1.movie_title")), "meter" -> (_.text1("div#all-critics-numbers span#all-critics- meter")), "rating" -> (_.text1("div#all-critics-numbers p.critic_stats span")), "review_count" -> (_.text1("div#all-critics-numbers p.critic_stats span[itemprop=reviewCount]")) ) .wgetJoin("div#contentReviews h3 a") `
  • 11. http://www.rottentomatoes.com/m/gua rdians_of_the_galaxy/reviews/ .wgetInsertPagination("div.scroller a.right", indexKey = "page") // grab all pages by using right arrow button .joinBySlice("div#reviews div.media_block") //slice into review blocks .selectInto( "critic_name" -> (_.text1("div.criticinfo strong a")), "critic_org" -> (_.text1("div.criticinfo em.subtle")), "critic_review" -> (_.text1("div.reviewsnippet p")), "critic_score" -> (_.ownText1("div.reviewsnippet p.subtle")) ) .wgetJoin("div.criticinfo strong a")
  • 12. http://www.rottentomatoes.com/critic/s ean-means/ .selectInto( "total_reviews_ratings" -> (_.text("div.media_block div.clearfix dd").toString()) ) .asJsonRDD() .collect()
  • 13. How to test 1. Go to: http://ec2-54-88-40- 125.compute- 1.amazonaws.com:8888/notebooks /all_inclusive_demo.ipynb# in your browser. 2. Find Rotten Tomatoes Review Extraction 3. Execute! And wait to see the results. 4. Go to: http://ec2-54-88-40- 125.compute- 1.amazonaws.com:4040/stages/ to see your progress