Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Journey through New Languages - Rancho Dev 2017

109 views

Published on

A journey to learn Crystal and Elixir and compare them with Ruby. And how just raw performance is not enough to compare languages.

Published in: Technology
  • Be the first to comment

A Journey through New Languages - Rancho Dev 2017

  1. 1. LANGUAGES: A JOURNEY @akitaonrails
  2. 2. LANGUAGES: A JOURNEY RANCHO DEV 2017 @akitaonrails
  3. 3. @akitaonrails
  4. 4. www.theconf.club
  5. 5. Languages Syntax are EASY Architectures (PATTERNS) are HARD
  6. 6. git checkout -b old_version remotes/origin/old_version
  7. 7. time bin/manga-downloadr -t
  8. 8. #!/usr/bin/env ruby $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib') require 'optparse' options = { test: false } option_parser = OptionParser.new do |opts| opts.banner = "Usage: manga-downloadr [options]" opts.on("-t", "--test", "Test routine") do |t| options[:url] = "http://www.mangareader.net/onepunch-man" options[:name] = "one-punch-man" options[:directory] = "/tmp/manga-downloadr/one-punch-man" options[:test] = true end opts.on("-u URL", "--url URL", "Full MangaReader.net manga homepage URL - required") do |v| options[:url] = v end opts.on("-n NAME", "--name NAME", "slug to be used for the sub-folder to store all manga files - required") do |n| options[:name] = n end opts.on("-d DIRECTORY", "--directory DIRECTORY", "main folder where all mangas will be stored - required") do |d| options[:directory] = d end opts.on("-h", "--help", "Show this message") do puts opts exit end end
  9. 9. require 'manga-downloadr' generator = MangaDownloadr::Workflow.create(options[:url], options[:name], options[:directory]) generator.fetch_chapter_urls! generator.fetch_page_urls! generator.fetch_image_urls! generator.fetch_images! generator.compile_ebooks!
  10. 10. require 'manga-downloadr' generator = MangaDownloadr::Workflow.create(options[:url], options[:name], options[:directory]) puts "Massive parallel scanning of all chapters " generator.fetch_chapter_urls! puts "nMassive parallel scanning of all pages " generator.fetch_page_urls! puts "nMassive parallel scanning of all images " generator.fetch_image_urls! puts "nTotal page links found: #{generator.chapter_pages_count}" puts "nMassive parallel download of all page images " generator.fetch_images! puts "nCompiling all images into PDF volumes " generator.compile_ebooks! puts "nProcess finished."
  11. 11. require 'manga-downloadr' generator = MangaDownloadr::Workflow.create(options[:url], options[:name], options[:directory]) unless generator.state?(:chapter_urls) puts "Massive parallel scanning of all chapters " generator.fetch_chapter_urls! end unless generator.state?(:page_urls) puts "nMassive parallel scanning of all pages " generator.fetch_page_urls! end unless generator.state?(:image_urls) puts "nMassive parallel scanning of all images " generator.fetch_image_urls! puts "nTotal page links found: #{generator.chapter_pages_count}" end unless generator.state?(:images) puts "nMassive parallel download of all page images " generator.fetch_images! end unless options[:test] puts "nCompiling all images into PDF volumes " generator.compile_ebooks! end puts "nProcess finished." MangaDownloadr::Workflow
  12. 12. MangaDownloadr::Workflowmodule MangaDownloadr ImageData = Struct.new(:folder, :filename, :url) class Workflow def initialize(root_url = nil, manga_name = nil, manga_root = nil, options = {}) end def fetch_chapter_urls! end def fetch_page_urls! end def fetch_image_urls! end def fetch_images! end def compile_ebooks! end def state?(state) end private def current_state(state) end end end fetch_chapter_urls!
  13. 13. module MangaDownloadr ImageData = Struct.new(:folder, :filename, :url) class Workflow def initialize(root_url = nil, manga_name = nil, manga_root = nil, options = {}) end def fetch_chapter_urls! end def fetch_page_urls! end def fetch_image_urls! end def fetch_images! end def compile_ebooks! end def state?(state) end private def current_state(state) end end end fetch_chapter_urls!
  14. 14. fetch_chapter_urls!def fetch_chapter_urls! doc = Nokogiri::HTML(open(manga_root_url)) self.chapter_list = doc.css("#listing a").map { |l| l['href']} self.manga_title = doc.css("#mangaproperties h1").first.text current_state :chapter_urls end
  15. 15. fetch_chapter_urls!def fetch_chapter_urls! doc = Nokogiri::HTML(open(manga_root_url)) self.chapter_list = doc.css("#listing a").map { |l| l['href']} self.manga_title = doc.css("#mangaproperties h1").first.text current_state :chapter_urls end
  16. 16. def fetch_page_urls! chapter_list.each do |chapter_link| response = Typhoeus.get "http://www.mangareader.net#{chapter_link}" chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  17. 17. def fetch_page_urls! chapter_list.each do |chapter_link| begin response = Typhoeus.get "http://www.mangareader.net#{chapter_link}" begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end rescue => e puts e end end unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  18. 18. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  19. 19. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  20. 20. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  21. 21. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  22. 22. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  23. 23. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  24. 24. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  25. 25. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{chapter_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) pages = chapter_doc.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") chapter_pages.merge!(chapter_link => pages.map { |p| p['value'] }) print '.' rescue => e self.fetch_page_urls_errors << { url: chapter_link, error: e, body: response.body } print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty? puts "n Errors fetching page urls:" puts fetch_page_urls_errors end self.chapter_pages_count = chapter_pages.values.inject(0) { |total, list| total += list.size } current_state :page_urls end
  26. 26. def fetch_page_urls! hydra = Typhoeus::Hydra.new(max_con chapter_list.each do |chapter_link| begin request = Typhoeus::Request.new request.on_complete do |respons begin chapter_doc = Nokogiri::HTM pages = chapter_doc.xpath(" chapter_pages.merge!(chapte print '.' rescue => e self.fetch_page_urls_errors print 'x' end end hydra.queue request rescue => e puts e end end hydra.run unless fetch_page_urls_errors.empty puts "n Errors fetching page url puts fetch_page_urls_errors end self.chapter_pages_count = chapter_ current_state :page_urls end
  27. 27. def fetch_image_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_key| chapter_pages[chapter_key].each do |page_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{page_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) image = chapter_doc.css('#img').first tokens = image['alt'].match("^(.*?)s-s(.*?)$") extension = File.extname(URI.parse(image['src']).path) chapter_images.merge!(chapter_key => []) if chapter_images[chapter_key].nil? chapter_images[chapter_key] << ImageData.new( tokens[1], "#{tokens[2]}#{extension}", image['src'] ) print '.' rescue => e self.fetch_image_urls_errors << { url: page_link, error: e } print 'x' end end hydra.queue request rescue => e puts e end end end hydra.run unless fetch_image_urls_errors.empty? puts "nErrors fetching image urls:" puts fetch_image_urls_errors end current_state :image_urls end
  28. 28. def fetch_image_urls! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each do |chapter_key| chapter_pages[chapter_key].each do |page_link| begin request = Typhoeus::Request.new "http://www.mangareader.net#{page_link}" request.on_complete do |response| begin chapter_doc = Nokogiri::HTML(response.body) image = chapter_doc.css('#img').first tokens = image['alt'].match("^(.*?)s-s(.*?)$") extension = File.extname(URI.parse(image['src']).path) chapter_images.merge!(chapter_key => []) if chapter_images[chapter_key].nil? chapter_images[chapter_key] << ImageData.new( tokens[1], "#{tokens[2]}#{extension}", image['src'] ) print '.' rescue => e self.fetch_image_urls_errors << { url: page_link, error: e } print 'x' end end hydra.queue request rescue => e puts e end end end hydra.run unless fetch_image_urls_errors.empty? puts "nErrors fetching image urls:" puts fetch_image_urls_errors end current_state :image_urls end
  29. 29. def fetch_images! hydra = Typhoeus::Hydra.new(max_concurrency: hydra_concurrency) chapter_list.each_with_index do |chapter_key, chapter_index| chapter_images[chapter_key].each do |file| downloaded_filename = File.join(manga_root_folder, file.folder, file.filename) next if File.exists?(downloaded_filename) # effectively resumes the download list without re-downloading eve request = Typhoeus::Request.new file.url request.on_complete do |response| begin # download FileUtils.mkdir_p(File.join(manga_root_folder, file.folder)) File.open(downloaded_filename, "wb+") { |f| f.write response.body } unless is_test # resize image = Magick::Image.read( downloaded_filename ).first resized = image.resize_to_fit(600, 800) resized.write( downloaded_filename ) { self.quality = 50 } GC.start # to avoid a leak too big (ImageMagick is notorious for that, specially on resizes) end print '.' rescue => e self.fetch_images_errors << { url: file.url, error: e } print '#' end end hydra.queue request end end hydra.run unless fetch_images_errors.empty? puts "nErrors downloading images:" puts fetch_images_errors end current_state :images end
  30. 30. def compile_ebooks! folders = Dir[manga_root_folder + "/*/"].sort_by { |element| ary = element.split(" ").last.to_i } self.download_links = folders.inject([]) do |list, folder| list += Dir[folder + "*.*"].sort_by { |element| ary = element.split(" ").last.to_i } end # concatenating PDF files (250 pages per volume) chapter_number = 0 while !download_links.empty? chapter_number += 1 pdf_file = File.join(manga_root_folder, "#{manga_title} #{chapter_number}.pdf") list = download_links.slice!(0..pages_per_volume) Prawn::Document.generate(pdf_file, page_size: page_size) do |pdf| list.each do |image_file| begin pdf.image image_file, position: :center, vposition: :center rescue => e puts "Error in #{image_file} - #{e}" end end end print '.' end current_state :ebooks end
  31. 31. time bin/manga-downloadr -t 17.18s user 17.62s system 41% cpu 1:24.04 total
  32. 32. time bin/manga-downloadr -t 17.18s user 17.62s system 41% cpu 1:24.04 total
  33. 33. . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files mix.exs
  34. 34. mix.exsdefmodule ExMangaDownloadr.Mixfile do use Mix.Project def project do [app: :ex_manga_downloadr, version: "1.0.2", elixir: "~> 1.4", build_embedded: Mix.env == :prod, start_permanent: Mix.env == :prod, escript: [main_module: ExMangaDownloadr.CLI], deps: deps()] end def application do [applications: [:logger, :httpoison, :porcelain, :observer]] end defp deps do [ {:httpoison, "~> 0.11"}, {:floki, "~> 0.17"}, {:porcelain, "~> 2.0.3"}, {:mock, "~> 0.2", only: :test} ] end end Mixfile
  35. 35. MixfilePoolManagement . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files workflow.ex
  36. 36. Mixfile . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files workflow.ex
  37. 37. workflow.exdefmodule ExMangaDownloadr.Workflow do def determine_source(url) do end def chapters({url, source}) do {:ok, {_manga_title, chapter_list}} = MangaWrapper.index_page(url, source) {chapter_list, source} end def pages({chapter_list, source}) do pages_list = chapter_list |> Task.async_stream(MangaWrapper, :chapter_page, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.reduce([], fn {:ok, {:ok, list}}, acc -> acc ++ list end) {pages_list, source} end def images_sources({pages_list, source}) do pages_list |> Task.async_stream(MangaWrapper, :page_image, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.map(fn {:ok, {:ok, image}} -> image end) end def process_downloads(images_list, directory) do images_list |> Task.async_stream(MangaWrapper, :page_download_image, [directory], max_concurrency: @max_demand / 2, timeout: @download_timeout) |> Enum.to_list() directory end def optimize_images(directory) do … end def compile_pdfs(directory, manga_name) do … end defp compile_volume(manga_name, directory, {chunk, index}) do … end defp prepare_volume(manga_name, directory, chunk, index) do … end defp chunk(collection, default_size) do … end end :chapter_page
  38. 38. workflow.exdefmodule ExMangaDownloadr.Workflow do def determine_source(url) do end def chapters({url, source}) do {:ok, {_manga_title, chapter_list}} = MangaWrapper.index_page(url, source) {chapter_list, source} end def pages({chapter_list, source}) do pages_list = chapter_list |> Task.async_stream(MangaWrapper, :chapter_page, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.reduce([], fn {:ok, {:ok, list}}, acc -> acc ++ list end) {pages_list, source} end def images_sources({pages_list, source}) do pages_list |> Task.async_stream(MangaWrapper, :page_image, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.map(fn {:ok, {:ok, image}} -> image end) end def process_downloads(images_list, directory) do images_list |> Task.async_stream(MangaWrapper, :page_download_image, [directory], max_concurrency: @max_demand / 2, timeout: @download_timeout) |> Enum.to_list() directory end def optimize_images(directory) do … end def compile_pdfs(directory, manga_name) do … end defp compile_volume(manga_name, directory, {chunk, index}) do … end defp prepare_volume(manga_name, directory, chunk, index) do … end defp chunk(collection, default_size) do … end end :chapter_page
  39. 39. workflow.exdefmodule ExMangaDownloadr.Workflow do def determine_source(url) do end def chapters({url, source}) do {:ok, {_manga_title, chapter_list}} = MangaWrapper.index_page(url, source) {chapter_list, source} end def pages({chapter_list, source}) do pages_list = chapter_list |> Task.async_stream(MangaWrapper, :chapter_page, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.reduce([], fn {:ok, {:ok, list}}, acc -> acc ++ list end) {pages_list, source} end def images_sources({pages_list, source}) do pages_list |> Task.async_stream(MangaWrapper, :page_image, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.map(fn {:ok, {:ok, image}} -> image end) end def process_downloads(images_list, directory) do images_list |> Task.async_stream(MangaWrapper, :page_download_image, [directory], max_concurrency: @max_demand / 2, timeout: @download_timeout) |> Enum.to_list() directory end def optimize_images(directory) do … end def compile_pdfs(directory, manga_name) do … end defp compile_volume(manga_name, directory, {chunk, index}) do … end defp prepare_volume(manga_name, directory, chunk, index) do … end defp chunk(collection, default_size) do … end end :chapter_page
  40. 40. workflow.exdefmodule ExMangaDownloadr.Workflow do def determine_source(url) do end def chapters({url, source}) do {:ok, {_manga_title, chapter_list}} = MangaWrapper.index_page(url, source) {chapter_list, source} end def pages({chapter_list, source}) do pages_list = chapter_list |> Task.async_stream(MangaWrapper, :chapter_page, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.reduce([], fn {:ok, {:ok, list}}, acc -> acc ++ list end) {pages_list, source} end def images_sources({pages_list, source}) do pages_list |> Task.async_stream(MangaWrapper, :page_image, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.map(fn {:ok, {:ok, image}} -> image end) end def process_downloads(images_list, directory) do images_list |> Task.async_stream(MangaWrapper, :page_download_image, [directory], max_concurrency: @max_demand / 2, timeout: @download_timeout) |> Enum.to_list() directory end def optimize_images(directory) do … end def compile_pdfs(directory, manga_name) do … end defp compile_volume(manga_name, directory, {chunk, index}) do … end defp prepare_volume(manga_name, directory, chunk, index) do … end defp chunk(collection, default_size) do … end end :chapter_page
  41. 41. POOL
  42. 42. workflow.exdefmodule ExMangaDownloadr.Workflow do def determine_source(url) do end def chapters({url, source}) do {:ok, {_manga_title, chapter_list}} = MangaWrapper.index_page(url, source) {chapter_list, source} end def pages({chapter_list, source}) do pages_list = chapter_list |> Task.async_stream(MangaWrapper, :chapter_page, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.reduce([], fn {:ok, {:ok, list}}, acc -> acc ++ list end) {pages_list, source} end def images_sources({pages_list, source}) do pages_list |> Task.async_stream(MangaWrapper, :page_image, [source], max_concurrency: @max_demand) |> Enum.to_list() |> Enum.map(fn {:ok, {:ok, image}} -> image end) end def process_downloads(images_list, directory) do images_list |> Task.async_stream(MangaWrapper, :page_download_image, [directory], max_concurrency: @max_demand / 2, timeout: @download_timeout) |> Enum.to_list() directory end def optimize_images(directory) do … end def compile_pdfs(directory, manga_name) do … end defp compile_volume(manga_name, directory, {chunk, index}) do … end defp prepare_volume(manga_name, directory, chunk, index) do … end defp chunk(collection, default_size) do … end end :chapter_page
  43. 43. manga_wrapper.exdefmodule MangaWrapper do require Logger def index_page(url, source) do source |> manga_source("IndexPage") |> apply(:chapters, [url]) end def chapter_page(chapter_link, source) do source |> manga_source("ChapterPage") |> apply(:pages, [chapter_link]) end def page_image(page_link, source) do source |> manga_source("Page") |> apply(:image, [page_link]) end def page_download_image(image_data, directory) do download_image(image_data, directory) end defp manga_source(source, module) do case source do "mangareader" -> :"Elixir.ExMangaDownloadr.MangaReader.#{module}" "mangafox" -> :"Elixir.ExMangaDownloadr.Mangafox.#{module}" end end defp download_image({image_src, image_filename}, directory) do end end :chapter_page ChapterPage
  44. 44. manga_wrapper.exdefmodule MangaWrapper do require Logger def index_page(url, source) do source |> manga_source("IndexPage") |> apply(:chapters, [url]) end def chapter_page(chapter_link, source) do source |> manga_source("ChapterPage") |> apply(:pages, [chapter_link]) end def page_image(page_link, source) do source |> manga_source("Page") |> apply(:image, [page_link]) end def page_download_image(image_data, directory) do download_image(image_data, directory) end defp manga_source(source, module) do case source do "mangareader" -> :"Elixir.ExMangaDownloadr.MangaReader.#{module}" "mangafox" -> :"Elixir.ExMangaDownloadr.Mangafox.#{module}" end end defp download_image({image_src, image_filename}, directory) do end end . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files :chapter_page ChapterPage
  45. 45. manga_wrapper.exdefmodule MangaWrapper do require Logger def index_page(url, source) do source |> manga_source("IndexPage") |> apply(:chapters, [url]) end def chapter_page(chapter_link, source) do source |> manga_source("ChapterPage") |> apply(:pages, [chapter_link]) end def page_image(page_link, source) do source |> manga_source("Page") |> apply(:image, [page_link]) end def page_download_image(image_data, directory) do download_image(image_data, directory) end defp manga_source(source, module) do case source do "mangareader" -> :"Elixir.ExMangaDownloadr.MangaReader.#{module}" "mangafox" -> :"Elixir.ExMangaDownloadr.Mangafox.#{module}" end end defp download_image({image_src, image_filename}, directory) do end end . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files :chapter_page ChapterPage
  46. 46. manga_wrapper.exdefmodule MangaWrapper do require Logger def index_page(url, source) do source |> manga_source("IndexPage") |> apply(:chapters, [url]) end def chapter_page(chapter_link, source) do source |> manga_source("ChapterPage") |> apply(:pages, [chapter_link]) end def page_image(page_link, source) do source |> manga_source("Page") |> apply(:image, [page_link]) end def page_download_image(image_data, directory) do download_image(image_data, directory) end defp manga_source(source, module) do case source do "mangareader" -> :"Elixir.ExMangaDownloadr.MangaReader.#{module}" "mangafox" -> :"Elixir.ExMangaDownloadr.Mangafox.#{module}" end end defp download_image({image_src, image_filename}, directory) do end end . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files :chapter_page ChapterPage
  47. 47. manga_wrapper.exdefmodule MangaWrapper do require Logger def index_page(url, source) do source |> manga_source("IndexPage") |> apply(:chapters, [url]) end def chapter_page(chapter_link, source) do source |> manga_source("ChapterPage") |> apply(:pages, [chapter_link]) end def page_image(page_link, source) do source |> manga_source("Page") |> apply(:image, [page_link]) end def page_download_image(image_data, directory) do download_image(image_data, directory) end defp manga_source(source, module) do case source do "mangareader" -> :"Elixir.ExMangaDownloadr.MangaReader.#{module}" "mangafox" -> :"Elixir.ExMangaDownloadr.Mangafox.#{module}" end end defp download_image({image_src, image_filename}, directory) do end end . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files :chapter_page ChapterPage
  48. 48. defmodule ExMangaDownloadr.Mangafox.ChapterPage do require Logger require ExMangaDownloadr def pages(chapter_link) do ExMangaDownloadr.fetch chapter_link, do: fetch_pages(chapter_link) end defp fetch_pages(html, chapter_link) do [_page|link_template] = chapter_link |> String.split("/") |> Enum.reverse html |> Floki.find("div[id='top_center_bar'] option") |> Floki.attribute("value") |> Enum.reject(fn page_number -> page_number == "0" end) |> Enum.map(fn page_number -> ["#{page_number}.html"|link_template] |> Enum.reverse |> Enum.join("/") end) end end ChapterPage
  49. 49. ChapterPage . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files cli.ex
  50. 50. cli.exdefmodule ExMangaDownloadr.CLI do alias ExMangaDownloadr.Workflow require ExMangaDownloadr def main(args) do args |> parse_args |> process end ... defp parse_args(args) do end defp process(:help) do end defp process(directory, url) do File.mkdir_p!(directory) File.mkdir_p!("/tmp/ex_manga_downloadr_cache") manga_name = directory |> String.split("/") |> Enum.reverse |> Enum.at(0) url |> Workflow.determine_source |> Workflow.chapters |> Workflow.pages |> Workflow.images_sources |> Workflow.process_downloads(directory) |> Workflow.optimize_images |> Workflow.compile_pdfs(manga_name) |> finish_process end defp process_test(directory, url) do end defp finish_process(directory) do end end Workflow
  51. 51. mix deps.get mix test mix escript.build
  52. 52. mix deps.get mix test mix escript.build ex_manga_downloadr - 4.6M
  53. 53. time ./ex_manga_downloadr —test
  54. 54. time ./ex_manga_downloadr —test 32.03s user 57.97s system 120% cpu 1:14.45 total
  55. 55. . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files . !"" cr_manga_downloadr !"" libs # !"" ... !"" LICENSE !"" README.md !"" shard.lock !"" shard.yml !"" spec # !"" cr_manga_downloadr # # !"" chapters_spec.cr # # !"" concurrency_spec.cr # # !"" image_downloader_spec.cr # # !"" page_image_spec.cr # # $"" pages_spec.cr # !"" fixtures # # !"" ... # $"" spec_helper.cr $"" src !"" cr_manga_downloadr # !"" chapters.cr # !"" concurrency.cr # !"" downloadr_client.cr # !"" image_downloader.cr # !"" page_image.cr # !"" pages.cr # !"" records.cr # !"" version.cr # $"" workflow.cr $"" cr_manga_downloadr.cr
  56. 56. . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files . !"" cr_manga_downloadr !"" libs # !"" ... !"" LICENSE !"" README.md !"" shard.lock !"" shard.yml !"" spec # !"" cr_manga_downloadr # # !"" chapters_spec.cr # # !"" concurrency_spec.cr # # !"" image_downloader_spec.cr # # !"" page_image_spec.cr # # $"" pages_spec.cr # !"" fixtures # # !"" ... # $"" spec_helper.cr $"" src !"" cr_manga_downloadr # !"" chapters.cr # !"" concurrency.cr # !"" downloadr_client.cr # !"" image_downloader.cr # !"" page_image.cr # !"" pages.cr # !"" records.cr # !"" version.cr # $"" workflow.cr $"" cr_manga_downloadr.cr
  57. 57. File.mkdir_p!(directory) File.mkdir_p!("/tmp/ex_manga_downloadr_cache") manga_name = directory |> String.split("/") |> Enum.reverse |> Enum.at(0) url |> Workflow.determine_source |> Workflow.chapters |> Workflow.pages |> Workflow.images_sources |> Workflow.process_downloads(directory) |> Workflow.optimize_images |> Workflow.compile_pdfs(manga_name) |> finish_process end
  58. 58. def run Dir.mkdir_p @config.download_directory pipe Steps.fetch_chapters(@config) .>> Steps.fetch_pages(@config) .>> Steps.fetch_images(@config) .>> Steps.download_images(@config) .>> Steps.optimize_images(@config) .>> Steps.prepare_volumes(@config) .>> unwrap puts "Done!" end File.mkdir_p!(directory) File.mkdir_p!("/tmp/ex_manga_downloadr_cache") manga_name = directory |> String.split("/") |> Enum.reverse |> Enum.at(0) url |> Workflow.determine_source |> Workflow.chapters |> Workflow.pages |> Workflow.images_sources |> Workflow.process_downloads(directory) |> Workflow.optimize_images |> Workflow.compile_pdfs(manga_name) |> finish_process end
  59. 59. # 1 y = c(b(a)) # 2 x = b(a) y = c(x) # Elixir Pipes y = a |> b |> c # Crystal Macro Pipes y = pipe a .>> b .>> c .>> unwrap
  60. 60. defmodule ExMangaDownloadr.MangaReader.IndexPage do require Logger require ExMangaDownloadr def chapters(manga_root_url) do ExMangaDownloadr.fetch manga_root_url, do: collect end defp collect(html) do {fetch_manga_title(html), fetch_chapters(html)} end defp fetch_manga_title(html) do html |> Floki.find("#mangaproperties h1") |> Floki.text end defp fetch_chapters(html) do html |> Floki.find("#listing a") |> Floki.attribute("href") end end
  61. 61. defmodule ExMangaDownloadr.MangaReader.IndexPage do require Logger require ExMangaDownloadr def chapters(manga_root_url) do ExMangaDownloadr.fetch manga_root_url, do: collect end defp collect(html) do {fetch_manga_title(html), fetch_chapters(html)} end defp fetch_manga_title(html) do html |> Floki.find("#mangaproperties h1") |> Floki.text end defp fetch_chapters(html) do html |> Floki.find("#listing a") |> Floki.attribute("href") end end
  62. 62. defmodule ExMangaDownloadr.MangaReader.IndexPage do require Logger require ExMangaDownloadr def chapters(manga_root_url) do ExMangaDownloadr.fetch manga_root_url, do: collect end defp collect(html) do {fetch_manga_title(html), fetch_chapters(html)} end defp fetch_manga_title(html) do html |> Floki.find("#mangaproperties h1") |> Floki.text end defp fetch_chapters(html) do html |> Floki.find("#listing a") |> Floki.attribute("href") end end
  63. 63. require "./downloadr_client" require "xml" module CrMangaDownloadr class Chapters < DownloadrClient def fetch html = get(@config.root_uri).as(XML::Node) nodes = html.xpath_nodes( "//table[contains(@id, 'listing')]//td//a/@href") nodes.map { |node| node.text.as(String) } end end end DownloadrClient
  64. 64. require "./downloadr_client" require "xml" module CrMangaDownloadr class Chapters < DownloadrClient def fetch html = get(@config.root_uri).as(XML::Node) nodes = html.xpath_nodes( "//table[contains(@id, 'listing')]//td//a/@href") nodes.map { |node| node.text.as(String) } end end end DownloadrClient
  65. 65. module CrMangaDownloadr class DownloadrClient ... def get(uri : String, binary = false) Dir.mkdir_p(@config.cache_directory) unless Dir.exists?(@config.cache_directory) cache_path = File.join(@config.cache_directory, cache_filename(uri)) while true begin response = if @cache_http && File.exists?(cache_path) body = File.read(cache_path) HTTP::Client::Response.new(200, body) else @http_client.get(uri, headers: HTTP::Headers{ "User-Agent" => CrMangaDownloadr::USER_AGENT }) end case response.status_code when 301 uri = response.headers["Location"] when 200 if ( binary || @cache_http ) && !File.exists?(cache_path) File.open(cache_path, "w") do |f| f.print response.body end end if binary return cache_path else return XML.parse_html(response.body) end end rescue IO::Timeout puts "Sleeping over #{uri}" sleep 1 end end end ... end DownloadrClient
  66. 66. module CrMangaDownloadr class DownloadrClient ... def get(uri : String, binary = false) Dir.mkdir_p(@config.cache_directory) unless Dir.exists?(@config.cache_directory) cache_path = File.join(@config.cache_directory, cache_filename(uri)) while true begin response = if @cache_http && File.exists?(cache_path) body = File.read(cache_path) HTTP::Client::Response.new(200, body) else @http_client.get(uri, headers: HTTP::Headers{ "User-Agent" => CrMangaDownloadr::USER_AGENT }) end case response.status_code when 301 uri = response.headers["Location"] when 200 if ( binary || @cache_http ) && !File.exists?(cache_path) File.open(cache_path, "w") do |f| f.print response.body end end if binary return cache_path else return XML.parse_html(response.body) end end rescue IO::Timeout puts "Sleeping over #{uri}" sleep 1 end end end ... end DownloadrClient
  67. 67. module CrMangaDownloadr class DownloadrClient ... def get(uri : String, binary = false) Dir.mkdir_p(@config.cache_directory) unless Dir.exists?(@config.cache_directory) cache_path = File.join(@config.cache_directory, cache_filename(uri)) while true begin response = if @cache_http && File.exists?(cache_path) body = File.read(cache_path) HTTP::Client::Response.new(200, body) else @http_client.get(uri, headers: HTTP::Headers{ "User-Agent" => CrMangaDownloadr::USER_AGENT }) end case response.status_code when 301 uri = response.headers["Location"] when 200 if ( binary || @cache_http ) && !File.exists?(cache_path) File.open(cache_path, "w") do |f| f.print response.body end end if binary return cache_path else return XML.parse_html(response.body) end end rescue IO::Timeout puts "Sleeping over #{uri}" sleep 1 end end end ... end DownloadrClient
  68. 68. module CrMangaDownloadr class DownloadrClient ... def get(uri : String, binary = false) Dir.mkdir_p(@config.cache_directory) unless Dir.exists?(@config.cache_directory) cache_path = File.join(@config.cache_directory, cache_filename(uri)) while true begin response = if @cache_http && File.exists?(cache_path) body = File.read(cache_path) HTTP::Client::Response.new(200, body) else @http_client.get(uri, headers: HTTP::Headers{ "User-Agent" => CrMangaDownloadr::USER_AGENT }) end case response.status_code when 301 uri = response.headers["Location"] when 200 if ( binary || @cache_http ) && !File.exists?(cache_path) File.open(cache_path, "w") do |f| f.print response.body end end if binary return cache_path else return XML.parse_html(response.body) end end rescue IO::Timeout puts "Sleeping over #{uri}" sleep 1 end end end ... end DownloadrClient
  69. 69. module CrMangaDownloadr class DownloadrClient ... def get(uri : String, binary = false) Dir.mkdir_p(@config.cache_directory) unless Dir.exists?(@config.cache_directory) cache_path = File.join(@config.cache_directory, cache_filename(uri)) while true begin response = if @cache_http && File.exists?(cache_path) body = File.read(cache_path) HTTP::Client::Response.new(200, body) else @http_client.get(uri, headers: HTTP::Headers{ "User-Agent" => CrMangaDownloadr::USER_AGENT }) end case response.status_code when 301 uri = response.headers["Location"] when 200 if ( binary || @cache_http ) && !File.exists?(cache_path) File.open(cache_path, "w") do |f| f.print response.body end end if binary return cache_path else return XML.parse_html(response.body) end end rescue IO::Timeout puts "Sleeping over #{uri}" sleep 1 end end end ... end DownloadrClient
  70. 70. module CrMangaDownloadr class DownloadrClient ... def get(uri : String, binary = false) Dir.mkdir_p(@config.cache_directory) unless Dir.exists?(@config.cache_directory) cache_path = File.join(@config.cache_directory, cache_filename(uri)) while true begin response = if @cache_http && File.exists?(cache_path) body = File.read(cache_path) HTTP::Client::Response.new(200, body) else @http_client.get(uri, headers: HTTP::Headers{ "User-Agent" => CrMangaDownloadr::USER_AGENT }) end case response.status_code when 301 uri = response.headers["Location"] when 200 if ( binary || @cache_http ) && !File.exists?(cache_path) File.open(cache_path, "w") do |f| f.print response.body end end if binary return cache_path else return XML.parse_html(response.body) end end rescue IO::Timeout puts "Sleeping over #{uri}" sleep 1 end end end ... end DownloadrClient
  71. 71. require "fiberpool" module CrMangaDownloadr struct Concurrency(A, B) def initialize(@config : Config, @engine_class : DownloadrClient.class) end def fetch(collection : Array(A)?, &block : A, DownloadrClient -> Array(B)?) : Array(B) results = [] of B if collection pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end end results end end end fetch Concurrency
  72. 72. require "fiberpool" module CrMangaDownloadr struct Concurrency(A, B) def initialize(@config : Config, @engine_class : DownloadrClient.class) end def fetch(collection : Array(A)?, &block : A, DownloadrClient -> Array(B)?) : Array(B) results = [] of B if collection pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end end results end end end fetch Concurrency
  73. 73. require "fiberpool" module CrMangaDownloadr struct Concurrency(A, B) def initialize(@config : Config, @engine_class : DownloadrClient.class) end def fetch(collection : Array(A)?, &block : A, DownloadrClient -> Array(B)?) : Array(B) results = [] of B if collection pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end end results end end end fetch Concurrency
  74. 74. require "fiberpool" module CrMangaDownloadr struct Concurrency(A, B) def initialize(@config : Config, @engine_class : DownloadrClient.class) end def fetch(collection : Array(A)?, &block : A, DownloadrClient -> Array(B)?) : Array(B) results = [] of B if collection pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end end results end end end fetch Concurrency
  75. 75. require "fiberpool" module CrMangaDownloadr struct Concurrency(A, B) def initialize(@config : Config, @engine_class : DownloadrClient.class) end def fetch(collection : Array(A)?, &block : A, DownloadrClient -> Array(B)?) : Array(B) results = [] of B if collection pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end end results end end end fetch Concurrency
  76. 76. require "fiberpool" module CrMangaDownloadr struct Concurrency(A, B) def initialize(@config : Config, @engine_class : DownloadrClient.class) end def fetch(collection : Array(A)?, &block : A, DownloadrClient -> Array(B)?) : Array(B) results = [] of B if collection pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end end results end end end fetch Concurrency
  77. 77. fetch Concurrency module CrMangaDownloadr class Workflow end module Steps def self.fetch_chapters(config : Config) end def self.fetch_pages(chapters : Array(String)?, config : Config) puts "Fetching pages from all chapters ..." reactor = Concurrency(String, String).new(config, Pages) reactor.fetch(chapters) do |link, engine| engine.try(&.fetch(link)).as(Array(String)) end end def self.fetch_images(pages : Array(String)?, config : Config) end def self.download_images(images : Array(Image)?, config : Config) end def self.optimize_images(downloads : Array(String), config : Config) end def self.prepare_volumes(downloads : Array(String), config : Config) end end end
  78. 78. fetch Concurrency module CrMangaDownloadr class Workflow end module Steps def self.fetch_chapters(config : Config) end def self.fetch_pages(chapters : Array(String)?, config : Config) puts "Fetching pages from all chapters ..." reactor = Concurrency(String, String).new(config, Pages) reactor.fetch(chapters) do |link, engine| engine.try(&.fetch(link)).as(Array(String)) end end def self.fetch_images(pages : Array(String)?, config : Config) end def self.download_images(images : Array(Image)?, config : Config) end def self.optimize_images(downloads : Array(String), config : Config) end def self.prepare_volumes(downloads : Array(String), config : Config) end end end
  79. 79. crystal deps crystal spec crystal build src/cr_manga_downloadr.cr --release
  80. 80. crystal deps crystal spec crystal build src/cr_manga_downloadr.cr --release cr_manga_downloadr 752K
  81. 81. time ./cr_manga_downloadr -t
  82. 82. time ./cr_manga_downloadr -t 5.57s user 6.79s system 14% cpu 1:26.76 total
  83. 83. . !"" _build # $"" ... !"" config # $"" config.exs !"" deps # !"" ... !"" ex_manga_downloadr !"" lib # !"" ex_manga_downloadr # # !"" cli.ex # # !"" mangafox # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" mangareader # # # !"" chapter_page.ex # # # !"" index_page.ex # # # $"" page.ex # # !"" manga_wrapper.ex # # $"" workflow.ex # $"" ex_manga_downloadr.ex !"" mix.exs !"" mix.lock !"" README.md $"" test !"" ex_manga_downloadr # !"" mangafox_test.exs # $"" mangareader_test.exs !"" ex_manga_downloadr_test.exs $"" test_helper.exs 61 directories, 281 files . !"" cr_manga_downloadr !"" libs # !"" ... !"" LICENSE !"" README.md !"" shard.lock !"" shard.yml !"" spec # !"" cr_manga_downloadr # # !"" chapters_spec.cr # # !"" concurrency_spec.cr # # !"" image_downloader_spec.cr # # !"" page_image_spec.cr # # $"" pages_spec.cr # !"" fixtures # # !"" ... # $"" spec_helper.cr $"" src !"" cr_manga_downloadr # !"" chapters.cr # !"" concurrency.cr # !"" downloadr_client.cr # !"" image_downloader.cr # !"" page_image.cr # !"" pages.cr # !"" records.cr # !"" version.cr # $"" workflow.cr $"" cr_manga_downloadr.cr
  84. 84. . !"" cr_manga_downloadr !"" libs # !"" ... !"" LICENSE !"" README.md !"" shard.lock !"" shard.yml !"" spec # !"" cr_manga_downloadr # # !"" chapters_spec.cr # # !"" concurrency_spec.cr # # !"" image_downloader_spec.cr # # !"" page_image_spec.cr # # $"" pages_spec.cr # !"" fixtures # # !"" ... # $"" spec_helper.cr $"" src !"" cr_manga_downloadr # !"" chapters.cr # !"" concurrency.cr # !"" downloadr_client.cr # !"" image_downloader.cr # !"" page_image.cr # !"" pages.cr # !"" records.cr # !"" version.cr # $"" workflow.cr $"" cr_manga_downloadr.cr . !"" bin # $"" manga-downloadr !"" Gemfile !"" Gemfile.lock !"" lib # !"" manga-downloadr # # !"" chapters.rb # # !"" concurrency.rb # # !"" downloadr_client.rb # # !"" image_downloader.rb # # !"" page_image.rb # # !"" pages.rb # # !"" records.rb # # !"" version.rb # # $"" workflow.rb # $"" manga-downloadr.rb !"" LICENSE.txt !"" manga-downloadr.gemspec !"" Rakefile !"" README.md $"" spec !"" fixtures # !"" ... !"" manga-downloadr # !"" chapters_spec.rb # !"" concurrency_spec.rb # !"" image_downloader_spec.rb # !"" page_image_spec.rb # $"" pages_spec.rb $"" spec_helper.rb
  85. 85. . !"" cr_manga_downloadr !"" libs # !"" ... !"" LICENSE !"" README.md !"" shard.lock !"" shard.yml !"" spec # !"" cr_manga_downloadr # # !"" chapters_spec.cr # # !"" concurrency_spec.cr # # !"" image_downloader_spec.cr # # !"" page_image_spec.cr # # $"" pages_spec.cr # !"" fixtures # # !"" ... # $"" spec_helper.cr $"" src !"" cr_manga_downloadr # !"" chapters.cr # !"" concurrency.cr # !"" downloadr_client.cr # !"" image_downloader.cr # !"" page_image.cr # !"" pages.cr # !"" records.cr # !"" version.cr # $"" workflow.cr $"" cr_manga_downloadr.cr . !"" bin # $"" manga-downloadr !"" Gemfile !"" Gemfile.lock !"" lib # !"" manga-downloadr # # !"" chapters.rb # # !"" concurrency.rb # # !"" downloadr_client.rb # # !"" image_downloader.rb # # !"" page_image.rb # # !"" pages.rb # # !"" records.rb # # !"" version.rb # # $"" workflow.rb # $"" manga-downloadr.rb !"" LICENSE.txt !"" manga-downloadr.gemspec !"" Rakefile !"" README.md $"" spec !"" fixtures # !"" ... !"" manga-downloadr # !"" chapters_spec.rb # !"" concurrency_spec.rb # !"" image_downloader_spec.rb # !"" page_image_spec.rb # $"" pages_spec.rb $"" spec_helper.rb
  86. 86. . !"" cr_manga_downloadr !"" libs # !"" ... !"" LICENSE !"" README.md !"" shard.lock !"" shard.yml !"" spec # !"" cr_manga_downloadr # # !"" chapters_spec.cr # # !"" concurrency_spec.cr # # !"" image_downloader_spec.cr # # !"" page_image_spec.cr # # $"" pages_spec.cr # !"" fixtures # # !"" ... # $"" spec_helper.cr $"" src !"" cr_manga_downloadr # !"" chapters.cr # !"" concurrency.cr # !"" downloadr_client.cr # !"" image_downloader.cr # !"" page_image.cr # !"" pages.cr # !"" records.cr # !"" version.cr # $"" workflow.cr $"" cr_manga_downloadr.cr . !"" bin # $"" manga-downloadr !"" Gemfile !"" Gemfile.lock !"" lib # !"" manga-downloadr # # !"" chapters.rb # # !"" concurrency.rb # # !"" downloadr_client.rb # # !"" image_downloader.rb # # !"" page_image.rb # # !"" pages.rb # # !"" records.rb # # !"" version.rb # # $"" workflow.rb # $"" manga-downloadr.rb !"" LICENSE.txt !"" manga-downloadr.gemspec !"" Rakefile !"" README.md $"" spec !"" fixtures # !"" ... !"" manga-downloadr # !"" chapters_spec.rb # !"" concurrency_spec.rb # !"" image_downloader_spec.rb # !"" page_image_spec.rb # $"" pages_spec.rb $"" spec_helper.rb
  87. 87. def run Dir.mkdir_p @config.download_directory pipe Steps.fetch_chapters(@config) .>> Steps.fetch_pages(@config) .>> Steps.fetch_images(@config) .>> Steps.download_images(@config) .>> Steps.optimize_images(@config) .>> Steps.prepare_volumes(@config) .>> unwrap puts "Done!" end
  88. 88. def self.run(config = Config.new) FileUtils.mkdir_p config.download_directory CM(config, Workflow) .fetch_chapters .fetch_pages(config) .fetch_images(config) .download_images(config) .optimize_images(config) .prepare_volumes(config) .unwrap puts "Done!" end def run Dir.mkdir_p @config.download_directory pipe Steps.fetch_chapters(@config) .>> Steps.fetch_pages(@config) .>> Steps.fetch_images(@config) .>> Steps.download_images(@config) .>> Steps.optimize_images(@config) .>> Steps.prepare_volumes(@config) .>> unwrap puts "Done!" end
  89. 89. # concurrency.cr pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end
  90. 90. # concurrency.cr pool = Fiberpool.new(collection, @config.download_batch_size) pool.run do |item| engine = @engine_class.new(@config) if reply = block.call(item, engine) results.concat(reply) end end pool = Thread.pool(@config.download_batch_size) mutex = Mutex.new results = [] collection.each do |item| pool.process { engine = @turn_on_engine ? @engine_klass.new(@config.domain, @config.cache_http) : nil reply = block.call(item, engine)&.flatten mutex.synchronize do results += ( reply || [] ) end } end pool.shutdown
  91. 91. Fibers Threads
  92. 92. module CrMangaDownloadr class Pages < DownloadrClient def fetch(chapter_link : String) html = get(chapter_link) nodes = html.xpath_nodes("//div[@id='selectpage']//select[@id='pageMenu']//option") nodes.map { |node| "#{chapter_link}/#{node.text}" } end end end
  93. 93. module CrMangaDownloadr class Pages < DownloadrClient def fetch(chapter_link : String) html = get(chapter_link) nodes = html.xpath_nodes("//div[@id='selectpage']//select[@id='pageMenu']//option") nodes.map { |node| "#{chapter_link}/#{node.text}" } end end end module MangaDownloadr class Pages < DownloadrClient def fetch(chapter_link) get chapter_link do |html| nodes = html.xpath("//div[@id='selectpage']//select[@id='pageMenu']//option") nodes.map { |node| [chapter_link, node.children.to_s].join("/") } end end end end
  94. 94. time bin/manga-downloadr -t
  95. 95. time bin/manga-downloadr -t 19.77s user 10.65s system 33% cpu 1:31.69 total
  96. 96. Ruby/Typhoeus
 (hydra_concurrency = 50) 41% CPU 1:24 min
  97. 97. Ruby/Typhoeus
 (hydra_concurrency = 50) 41% CPU 1:24 min Elixir 1.4.5
 (@max_demand=50) 120% CPU 1:14 min
  98. 98. Ruby/Typhoeus
 (hydra_concurrency = 50) 41% CPU 1:24 min Elixir 1.4.5
 (@max_demand=50) 120% CPU 1:14 min Crystal 0.23.0
 (opt_batch_size = 50) 14% CPU 1:26 min
  99. 99. Ruby/Typhoeus
 (hydra_concurrency = 50) 41% CPU 1:24 min Elixir 1.4.5
 (@max_demand=50) 120% CPU 1:14 min Crystal 0.23.0
 (opt_batch_size = 50) 14% CPU 1:26 min Ruby 2.4.1
 (opt_batch_size = 50) 33% CPU 1:31 min
  100. 100. Ruby Typhoeus libcurl
  101. 101. Ruby Typhoeus libcurl Elixir OTP Poolboy
  102. 102. Ruby Typhoeus libcurl Elixir OTP Poolboy Crystal Fibers Fiberpool
  103. 103. Ruby Typhoeus libcurl Elixir OTP Poolboy Crystal Fibers Fiberpool Ruby Thread Thread/Pool
  104. 104. manga-downloadr ex_manga_downloadr cr_manga_downloadr
  105. 105. manga-downloadr ex_manga_downloadr cr_manga_downloadr fiberpool cr_chainable_methods chainable_methods
  106. 106. PREMATURE OPTIMIZATION The Root of ALL Evil
  107. 107. THANKS @akitaonrails slideshare.net/akitaonrails
  108. 108. www.theconf.club

×