Modern Web Scraping with Python using Scrapy Splash Selenium
What you’ll learn
Understand the fundamentals of Web Scraping
Scrape websites using Scrapy
Understand Xpath & CSS Selectors
Build a complete Spider from A to Z
Store the extracted Data in MongoDb & SQLite3
Scrape JavaScript websites using Splash & Selenium
Build a CrawlSpider
Understand the Crawling behavior
Build a custom Middleware
Web Scraping best practices
Avoid getting banned while scraping websites
Bypass cloudflare
Scrape APIs
Scrape infinite scroll websites
Working with Cookies
Deploy spiders locally and to the cloud
Run spiders periodically
Prevent storing duplicated data
Build datasets
Login to websites using Scrapy
Download images and files using Scrapy
Requirements
Basics of Python
Internet access
Description
Web Scraping nowadays has become one of the hottest topics, there are plenty of paid tools out there in the market that don’t show you anything how things are done as you will be always limited to their functionalities as a consumer.In this course you won’t be a consumer anymore, i’ll teach you how you can build your own scraping tool ( spider ) using Scrapy.You will learn: The fundamentals of Web ScrapingHow to build a complete spiderThe fundamentals of XPath & CSS SelectorsHow to locate content/nodes from the DOM using XPath & CSSHow to store the data in JSON, CSV… and even to an external database(MongoDb & SQLite3)How to write your own custom PipelineFundamentals of SplashHow to scrape Javascript websites using Scrapy Splash & SeleniumThe Crawling behaviorHow to build a CrawlSpiderHow to avoid getting banned while scraping websitesHow to build a custom MiddlewareWeb Scraping best practicesHow to scrape APIsHow to use Request CookiesHow to scrape infinite scroll websitesHost spiders in Heroku for freeRun spiders periodically with a custom scriptPrevent storing duplicated dataDeploy Splash to Heroku Write data to Excel files Login to websites using ScrapyDownload Files & Images using ScrapyUse Proxies with Scrapy SpiderUse Crawlera with Scrapy & SplashUse Proxies with CrawlSpiderWhat makes this course different from the others, and why you should enroll ?First, this is the most updated course. You will be using Python 3.7, Scrapy 1.6 and Splash 3.0You will have an in-depth step by step guide on how to become a professional web scraper. You will learn how to use Splash & Selenium to scrape JavaScript websites and I can assure you, you won’t find any tutorials out there that teaches how to really use Splash like I’ll be doing in this course.You will learn how to host spiders in Heroku as well as Splash(Exclusive).You will learn how to create a custom script so spiders can run periodically without any intervention from you.30 days money back guarantee by Udemy So whether you are a data analyst who wants to add web scraping to his tool set or someone else who wants to learn how to extract unstructured data from unstructured HTML web pages and then store back that data in a structured way to apply some data analysis on it then you are welcome to join this course.**STUDENTS THOUGHTS ABOUT THIS COURSE **”I was particularly looking for web scraping using XPATHs and this course is addressing that. It also covers dynamic paging. A proper mix of theory and practical. A must-have for those who wants to do web scraping . GREAT learning experience !!! “. By Hiran Kumar”90% of what I was searching for!!! Great job!! Clear explanations and great communication with Ahmed”. By Raylyson Estanista “Admed’s Web scraping course is awesome . His approach using Python with scrapy and splash works well with all websites especially those that make heavy use of JavaScript. Ahmed is a gifted educator: expert communicator, passionate, conscientious and accessible to his students. I highly recommend this course and any of Ahmed Rafik’s Udemy courses. “. By Richard Blackmon”Great course, and a nice introduction to Scrapy (I’m someone with no Python experience whatsoever).”. By I S”Excellent course. Quick and thorough at the same time. Ahmed is incredibly responsive to the students and often replies to questions within minutes! Highest recommendation.” By Robert Nolte”That course is very good and explanation is crystal clear! The instructor is very supportive in case of questions. Highly recommended.” By Shubina Ekaterina “I like the course. Clear explanations and good comunication with Ahmed. All topics is interesting and full of information. I improved my skils in Scrapy. Author update course content by new videos. It’s a big bonus) Explained more advance topics I never see in other courses. Thank you, Ahmed. Waiting for new videos)”. By Ruslan Romanenko
Overview
Section 1: Introduction
Lecture 1 Intro to Web Scraping & Scrapy
Lecture 2 Setting up Scrapy the Development Environment (Updated)
Lecture 3 Add VSCODE to path (Mac users)
Lecture 4 Udemy 101 (Please don’t skip*)
Lecture 5 Asking questions
Section 2: Scrapy Fundamentals
Lecture 6 Scrapy fundamentals PART 1
Lecture 7 Scrapy fundamentals PART 2
Lecture 8 Scrapy fundamentals PART 3
Lecture 9 Scrapy fundamentals PART 4
Lecture 10 Scrapy fundamentals PART 5
Section 3: XPath expressions & CSS Selectors
Lecture 11 Downloadable files
Lecture 12 XPath & CSS Selectors
Lecture 13 CSS Selectors fundamentals
Lecture 14 CSS selectors in theory
Lecture 15 XPath fundamentals
Lecture 16 Navigating using XPath(Going UP)
Lecture 17 Navigating using XPath(Going DOWN)
Lecture 18 XPath in theory
Section 4: Project 1 Spiders from A to Z
Lecture 19 Worldometers PART 1
Lecture 20 Worldometers PART 2
Lecture 21 Worldometers PART 3
Lecture 22 Worldometers PART 4
Lecture 23 Project source code
Lecture 24 Exercise
Section 5: Building Datasets
Lecture 25 Bulding datesets
Section 6: Project 2 Dealing with Multiple pages
Lecture 26 Website URL (Please do not skip)
Lecture 27 Setting up the project
Lecture 28 Setting up the project – Code update –
Lecture 29 Building the spider
Lecture 30 Dealing with pagination
Lecture 31 Spoofing request headers
Lecture 32 TinyDeal project source code
Lecture 33 Exercise 2
Section 7: Debugging spiders
Lecture 34 What is debugging?
Lecture 35 Debugging spiders PART 1
Lecture 36 Debugging spiders PART 2
Section 8: Let’s take a break !
Lecture 37 The “whys” & “whens” of web scraping
Lecture 38 Web scraping challenges
Section 9: Project 3 Build Crawlers using Scrapy
Lecture 39 Website URL update
Lecture 40 Crawl spider structure
Lecture 41 The Rule object
Lecture 42 Following links in pagination
Lecture 43 Spoofing request headers
Lecture 44 Project source code
Lecture 45 Exercise
Section 10: Splash crash course
Lecture 46 What dilemma splash came to solve
Lecture 47 Setting up Splash (Windows Pro/Entreprise edition & Mac Os)
Lecture 48 Setting up Splash(Windows Home Edition)
Lecture 49 Setting up Splash (Linux)
Lecture 50 Introduction to Splash
Lecture 51 Working with elements
Lecture 52 Spoofing request headers
Section 11: Project 4 Scraping JavaScript websites using Splash
Lecture 53 Website URL update
Lecture 54 Splash incognito mode
Lecture 55 Using Splash with Scrapy
Lecture 56 Parsing (BAD HTML MARKUP)
Lecture 57 Project source code
Lecture 58 Exercise
Section 12: Project 5 Scraping JavaScript websites using Selenium
Lecture 59 Selenium basics
Lecture 60 ElementNotInteractable Exception
Lecture 61 Selenium with Scrapy
Lecture 62 Selenium Middleware PART 1 (NEW)
Lecture 63 Selenium Middleware PART 2 (NEW)
Lecture 64 Project source code
Section 13: Working with Pipelines
Lecture 65 Pipelines
Lecture 66 Storing data in MongoDB
Lecture 67 Storing data in SQLite3
Lecture 68 Project source code
Section 14: Scraping APIs (NEW)
Lecture 69 Scraping APIs PART 1
Lecture 70 Scraping APIs PART 2
Lecture 71 Scraping APIs PART 3
Lecture 72 Scraping APIs PART 4
Lecture 73 Scraping APIs PART 5
Lecture 74 Project source code
Section 15: Log in to websites (NEW)
Lecture 75 Log in to websites PART 1
Lecture 76 Log in to websites PART 2
Lecture 77 Log in to websites PART 3 (JavaScript required)
Lecture 78 Project source code
Section 16: Project 6 Bypass Cloudflare
Lecture 79 Website URL update
Lecture 80 Bypass Cloudflare PART 1
Lecture 81 Bypass Cloudflare PART 2
Lecture 82 Project source code
Section 17: APPENDIX (OLDER SCRAPY 1.5 CONTENT)
Lecture 83 *IMPORTANT*
Lecture 84 Avoid getting banned PART 1
Lecture 85 Avoid getting banned PART 2
Lecture 86 Avoid getting banned PART 3
Lecture 87 Scraping APIs PART 1
Lecture 88 Scraping APIs PART 2
Lecture 89 Scraping APIs PART 3
Lecture 90 Scraping APIs PART 4
Lecture 91 Hidden XHR
Lecture 92 Scraping APIs PART 5
Lecture 93 IMPORTANT NOTE
Lecture 94 Scraping APIs PART 6
Lecture 95 Spider Arguments
Lecture 96 Scraping APIs PART 7
Lecture 97 *IMPORTANT*
Lecture 98 Another way to scrape Airbnb restaurant detail page
Lecture 99 Deploying spiders PART 1
Lecture 100 Deploying spiders PART 2
Lecture 101 Deploying spiders PART 3
Lecture 102 Deploying spiders PART 4
Lecture 103 Execute spiders periodically
Lecture 104 Deploy Splash to Heroku
Lecture 105 *IMPORTANT*
Lecture 106 Project source code
Lecture 107 Project source code
Lecture 108 Challenge for those who are adventurous
Lecture 109 Login to websites using FormRequest
Lecture 110 XML Http Post Requests
Lecture 111 Project source code
Lecture 112 Code UPDATE XHR repeated data (Assignment)
Lecture 113 Media Pipelines
Lecture 114 The Images Pipeline
Lecture 115 Extending The Images Pipeline (Store images with custom names)
Lecture 116 *IMPORTANT*
Lecture 117 Files Pipeline (Article)
Lecture 118 Challenge (Files Pipeline)
Lecture 119 Project source code
Lecture 120 Using Crawlera with Scrapy
Lecture 121 Using Crawlera with Splash
Lecture 122 Using Heroku as a Proxy (FREE)
Lecture 123 Using FREE Proxies with the CrawlSpider
Lecture 124 *IMPORTANT*
Lecture 125 Challenge
Lecture 126 Project source code
Section 18: BONUS
Lecture 127 Files Pipeline
Lecture 128 Bonus Lecture
Anyone who wants to scrape data from any website,Anyone who wants to learn Scrapy,Anyone who wants to automate the task of copying contents from websites,Anyone who wants to learn how to scrape Javascript websites using Scrapy-Splash & Selenium
Course Information:
Udemy | English | 8h 51m | 5.04 GB
Created by: Ahmed Rafik
You Can See More Courses in the Developer >> Greetings from CourseDown.com