Modern Web Scraping with Python using Scrapy Splash Selenium

Become an expert in web scraping and web crawling using Python 3, Scrapy, Splash and Selenium 2nd EDITION (2021)
Modern Web Scraping with Python using Scrapy Splash Selenium
File Size :
5.09 GB
Total length :
8h 51m

Category

Instructor

Ahmed Rafik

Language

Last update

Last updated 5/2021

Ratings

4.4/5

Modern Web Scraping with Python using Scrapy Splash Selenium

What you’ll learn

Understand the fundamentals of Web Scraping
Scrape websites using Scrapy
Understand Xpath & CSS Selectors
Build a complete Spider from A to Z
Store the extracted Data in MongoDb & SQLite3
Scrape JavaScript websites using Splash & Selenium
Build a CrawlSpider
Understand the Crawling behavior
Build a custom Middleware
Web Scraping best practices
Avoid getting banned while scraping websites
Bypass cloudflare
Scrape APIs
Scrape infinite scroll websites
Working with Cookies
Deploy spiders locally and to the cloud
Run spiders periodically
Prevent storing duplicated data
Build datasets
Login to websites using Scrapy
Download images and files using Scrapy

Modern Web Scraping with Python using Scrapy Splash Selenium

Requirements

Basics of Python
Internet access

Description

Web Scraping nowadays has become one of the hottest topics, there are plenty of paid tools out there in the market that don’t show you anything how things are done as you will be always limited to their functionalities as a consumer.In this course you won’t be a consumer anymore, i’ll teach you how you can build your own scraping tool ( spider ) using Scrapy.You will learn: The fundamentals of Web ScrapingHow to build a complete spiderThe fundamentals of XPath & CSS SelectorsHow to locate content/nodes from the DOM using XPath & CSSHow to store the data in JSON, CSV… and even to an external database(MongoDb & SQLite3)How to write your own custom PipelineFundamentals of SplashHow to scrape Javascript websites using Scrapy Splash & SeleniumThe Crawling behaviorHow to build a CrawlSpiderHow to avoid getting banned while scraping websitesHow to build a custom MiddlewareWeb Scraping best practicesHow to scrape APIsHow to use Request CookiesHow to scrape infinite scroll websitesHost spiders in Heroku for freeRun spiders periodically with a custom scriptPrevent storing duplicated dataDeploy Splash to Heroku Write data to Excel files Login to websites using ScrapyDownload Files & Images using ScrapyUse Proxies with Scrapy SpiderUse Crawlera with Scrapy & SplashUse Proxies with CrawlSpiderWhat makes this course different from the others, and why you should enroll ?First, this is the most updated course. You will be using Python 3.7, Scrapy 1.6 and Splash 3.0You will have an in-depth step by step guide on how to become a professional web scraper. You will learn how to use Splash & Selenium to scrape JavaScript websites and I can assure you, you won’t find any tutorials out there that teaches how to really use Splash like I’ll be doing in this course.You will learn how to host spiders in Heroku as well as Splash(Exclusive).You will learn how to create a custom script so spiders can run periodically without any intervention from you.30 days money back guarantee by Udemy So whether you are a data analyst who wants to add web scraping to his tool set or someone else who wants to learn how to extract unstructured data from unstructured HTML web pages and then store back that data in a structured way to apply some data analysis on it then you are welcome to join this course.**STUDENTS THOUGHTS ABOUT THIS COURSE **”I was particularly looking for web scraping using XPATHs and this course is addressing that. It also covers dynamic paging. A proper mix of theory and practical. A must-have for those who wants to do web scraping . GREAT learning experience !!! “. By Hiran Kumar”90% of what I was searching for!!! Great job!! Clear explanations and great communication with Ahmed”. By Raylyson Estanista  “Admed’s Web scraping course is awesome . His approach using Python with scrapy and splash works well with all websites especially those that make heavy use of JavaScript. Ahmed is a gifted educator: expert communicator, passionate, conscientious and accessible to his students. I highly recommend this course and any of Ahmed Rafik’s Udemy courses. “. By Richard Blackmon”Great course, and a nice introduction to Scrapy (I’m someone with no Python experience whatsoever).”. By I S”Excellent course. Quick and thorough at the same time. Ahmed is incredibly responsive to the students and often replies to questions within minutes! Highest recommendation.” By Robert Nolte”That course is very good and explanation is crystal clear! The instructor is very supportive in case of questions. Highly recommended.” By Shubina Ekaterina “I like the course. Clear explanations and good comunication with Ahmed. All topics is interesting and full of information. I improved my skils in Scrapy. Author update course content by new videos. It’s a big bonus) Explained more advance topics I never see in other courses. Thank you, Ahmed. Waiting for new videos)”. By Ruslan Romanenko

Overview

Section 1: Introduction

Lecture 1 Intro to Web Scraping & Scrapy

Lecture 2 Setting up Scrapy the Development Environment (Updated)

Lecture 3 Add VSCODE to path (Mac users)

Lecture 4 Udemy 101 (Please don’t skip*)

Lecture 5 Asking questions

Section 2: Scrapy Fundamentals

Lecture 6 Scrapy fundamentals PART 1

Lecture 7 Scrapy fundamentals PART 2

Lecture 8 Scrapy fundamentals PART 3

Lecture 9 Scrapy fundamentals PART 4

Lecture 10 Scrapy fundamentals PART 5

Section 3: XPath expressions & CSS Selectors

Lecture 11 Downloadable files

Lecture 12 XPath & CSS Selectors

Lecture 13 CSS Selectors fundamentals

Lecture 14 CSS selectors in theory

Lecture 15 XPath fundamentals

Lecture 16 Navigating using XPath(Going UP)

Lecture 17 Navigating using XPath(Going DOWN)

Lecture 18 XPath in theory

Section 4: Project 1 Spiders from A to Z

Lecture 19 Worldometers PART 1

Lecture 20 Worldometers PART 2

Lecture 21 Worldometers PART 3

Lecture 22 Worldometers PART 4

Lecture 23 Project source code

Lecture 24 Exercise

Section 5: Building Datasets

Lecture 25 Bulding datesets

Section 6: Project 2 Dealing with Multiple pages

Lecture 26 Website URL (Please do not skip)

Lecture 27 Setting up the project

Lecture 28 Setting up the project – Code update –

Lecture 29 Building the spider

Lecture 30 Dealing with pagination

Lecture 31 Spoofing request headers

Lecture 32 TinyDeal project source code

Lecture 33 Exercise 2

Section 7: Debugging spiders

Lecture 34 What is debugging?

Lecture 35 Debugging spiders PART 1

Lecture 36 Debugging spiders PART 2

Section 8: Let’s take a break !

Lecture 37 The “whys” & “whens” of web scraping

Lecture 38 Web scraping challenges

Section 9: Project 3 Build Crawlers using Scrapy

Lecture 39 Website URL update

Lecture 40 Crawl spider structure

Lecture 41 The Rule object

Lecture 42 Following links in pagination

Lecture 43 Spoofing request headers

Lecture 44 Project source code

Lecture 45 Exercise

Section 10: Splash crash course

Lecture 46 What dilemma splash came to solve

Lecture 47 Setting up Splash (Windows Pro/Entreprise edition & Mac Os)

Lecture 48 Setting up Splash(Windows Home Edition)

Lecture 49 Setting up Splash (Linux)

Lecture 50 Introduction to Splash

Lecture 51 Working with elements

Lecture 52 Spoofing request headers

Section 11: Project 4 Scraping JavaScript websites using Splash

Lecture 53 Website URL update

Lecture 54 Splash incognito mode

Lecture 55 Using Splash with Scrapy

Lecture 56 Parsing (BAD HTML MARKUP)

Lecture 57 Project source code

Lecture 58 Exercise

Section 12: Project 5 Scraping JavaScript websites using Selenium

Lecture 59 Selenium basics

Lecture 60 ElementNotInteractable Exception

Lecture 61 Selenium with Scrapy

Lecture 62 Selenium Middleware PART 1 (NEW)

Lecture 63 Selenium Middleware PART 2 (NEW)

Lecture 64 Project source code

Section 13: Working with Pipelines

Lecture 65 Pipelines

Lecture 66 Storing data in MongoDB

Lecture 67 Storing data in SQLite3

Lecture 68 Project source code

Section 14: Scraping APIs (NEW)

Lecture 69 Scraping APIs PART 1

Lecture 70 Scraping APIs PART 2

Lecture 71 Scraping APIs PART 3

Lecture 72 Scraping APIs PART 4

Lecture 73 Scraping APIs PART 5

Lecture 74 Project source code

Section 15: Log in to websites (NEW)

Lecture 75 Log in to websites PART 1

Lecture 76 Log in to websites PART 2

Lecture 77 Log in to websites PART 3 (JavaScript required)

Lecture 78 Project source code

Section 16: Project 6 Bypass Cloudflare

Lecture 79 Website URL update

Lecture 80 Bypass Cloudflare PART 1

Lecture 81 Bypass Cloudflare PART 2

Lecture 82 Project source code

Section 17: APPENDIX (OLDER SCRAPY 1.5 CONTENT)

Lecture 83 *IMPORTANT*

Lecture 84 Avoid getting banned PART 1

Lecture 85 Avoid getting banned PART 2

Lecture 86 Avoid getting banned PART 3

Lecture 87 Scraping APIs PART 1

Lecture 88 Scraping APIs PART 2

Lecture 89 Scraping APIs PART 3

Lecture 90 Scraping APIs PART 4

Lecture 91 Hidden XHR

Lecture 92 Scraping APIs PART 5

Lecture 93 IMPORTANT NOTE

Lecture 94 Scraping APIs PART 6

Lecture 95 Spider Arguments

Lecture 96 Scraping APIs PART 7

Lecture 97 *IMPORTANT*

Lecture 98 Another way to scrape Airbnb restaurant detail page

Lecture 99 Deploying spiders PART 1

Lecture 100 Deploying spiders PART 2

Lecture 101 Deploying spiders PART 3

Lecture 102 Deploying spiders PART 4

Lecture 103 Execute spiders periodically

Lecture 104 Deploy Splash to Heroku

Lecture 105 *IMPORTANT*

Lecture 106 Project source code

Lecture 107 Project source code

Lecture 108 Challenge for those who are adventurous

Lecture 109 Login to websites using FormRequest

Lecture 110 XML Http Post Requests

Lecture 111 Project source code

Lecture 112 Code UPDATE XHR repeated data (Assignment)

Lecture 113 Media Pipelines

Lecture 114 The Images Pipeline

Lecture 115 Extending The Images Pipeline (Store images with custom names)

Lecture 116 *IMPORTANT*

Lecture 117 Files Pipeline (Article)

Lecture 118 Challenge (Files Pipeline)

Lecture 119 Project source code

Lecture 120 Using Crawlera with Scrapy

Lecture 121 Using Crawlera with Splash

Lecture 122 Using Heroku as a Proxy (FREE)

Lecture 123 Using FREE Proxies with the CrawlSpider

Lecture 124 *IMPORTANT*

Lecture 125 Challenge

Lecture 126 Project source code

Section 18: BONUS

Lecture 127 Files Pipeline

Lecture 128 Bonus Lecture

Anyone who wants to scrape data from any website,Anyone who wants to learn Scrapy,Anyone who wants to automate the task of copying contents from websites,Anyone who wants to learn how to scrape Javascript websites using Scrapy-Splash & Selenium

Course Information:

Udemy | English | 8h 51m | 5.09 GB
Created by: Ahmed Rafik

You Can See More Courses in the Developer >> Greetings from CourseDown.com

New Courses

Scroll to Top