Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Website Auto scraping with Autoit and .Net HttpRequest

1,261 views

Published on

It's a High Level Design for Website Auto scraping with Autoit and .Net HttpRequest. If you interest this source code please inform me.

Published in: Engineering
  • Be the first to like this

Website Auto scraping with Autoit and .Net HttpRequest

  1. 1. 29 April 2016 AuTo Scraping Blackie Tsai blackie1019@gmail.com
  2. 2. AgendaAgenda •Background •Behavior and System Analysis •HLD
  3. 3. Background
  4. 4. •User requirement • A desktop application for scraping Odds and OnlineList from rental site • Total 1, 652 links include 16,5200 records • Data format arrange and export to a excel file •Non-function requirement • Avoid to lock account by action as similar as DDOS Background
  5. 5. Behavior and System Analysis
  6. 6. Behavior Analysis •Website need Login first •Login session will keep alive if you idle in MainPage(timely sync request post from client) •After login to MainPage, each click open Pop- up window to display •Each data page will display 100 records by filter you give
  7. 7. System Analysis •Security • Website need login to get SessionId and StickyId for request • Website have security mechanism to redirect invalid request • Using one time token to avoid user request page data without permission when login to Main page • All sub-page(pop-up window) only allow open from Main page • Website using RESTful-like routing include UserSession token •Routing and Request Post • URL routing included Login Token • MainPage routing included BuildVersion • Request need add Query key(it pass from Main window) for Odds and OnlineList service
  8. 8. Chanllege •Issue 1 • Too many links to scraping if using Selenium or other similar solution. •Issue 2 • Some data need using JavaScript to decrypt and re-generate(RSA token, one time token and etc…). •Issue 3 • Need capture response header(Session and StickyId) to mock the request to query the Odds an OnlineList service.
  9. 9. HLD
  10. 10. Use of Technology •C# and .Net framework •AutoIt(Download) • AutoIt v3 is a freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting. • Have script Editor to build up the script • Can execute ShellScript • Can compile script to .exe file •AutoItX aka NAutoIt(Download) • Methods available to AutoIt BASIC, but not provided via AutoItX, are replaced by .NET counterparts. • AutoItX with PowerShell, .NET, C, COM, COM interop and reg free COM interfaces.
  11. 11. HLD
  12. 12. Use of Technology - AutoIt •AutoIt Window Info
  13. 13. Use of Technology - AutoIt •SciTe Script Editor • Write script with IDE, hint intelligence and Help guide
  14. 14. Use of Technology - AutoIt •Run Script • Execute .au3 or .a3x file. •Compile Script to .exe • Convert .au3 script to .exe or .a3x file
  15. 15. •Setup • Project reference AutoItX3.Assembly.dll • Project add AutoItX3.dll and update setting to CopyToOutput Use of Technology - AutoItX
  16. 16. Q & AQ & A
  17. 17. 11F., No.399, Ruiguang Rd., Neihu Dist., Taipei City 114, Taiwan TEL: +886 2 2798 8529 Fax: +886 2 2798 8531 Website : www.xuenn.com THANK YOU!THANK YOU!

×