InsideDarkWeb.com

Scrapy Last Page is not null and after page 146 last page is showing again

The website has 146 pages with words but after page 146 the last page is showing again.
`

     if next_page is not None:

         yield response.follow(next_page, callback = self.parse)`

With this method sprider is not stoping at page 146 and it continues because page 147,148,149..is same as page 146. I tried to use for loop but that not worked. Also, I tried to take the value in next page button and break the function with next_extract. By the way output of next_extract is [‘kelimeler.php?s=1’]and the number increases with the page number like [‘kelimeler.php?s=2’]. Also, this way is not worked.

         next_page = response.css('div.col-md-6.col-sm-6.col-xs-6:nth-child(2) a::attr(href)').get()
     next_extract = response.css('div.col-md-6.col-sm-6.col-xs-6:nth-child(2) a').xpath("@href").extract()

     print(next_page)
     print(next_extract)




     
     if next_extract is 'kelimeler.php?s=147':
         break
     if next_page is not None:
         yield response.follow(next_page, callback = self.parse)

What should I do to stop the scrapying at page 146?

That’s the whole parse function

     def parse(self,response):

     items = TidtutorialItem()

     all_div_kelimeler = response.css('a.collapsed')

     for tid in all_div_kelimeler:

         kelime = tid.css('a.collapsed::text').extract()
         link= tid.css('a.collapsed::text').xpath("@href").extract()


         items['Kelime'] = kelime
         items['Link'] = link

         yield items

     next_page = response.css('div.col-md-6.col-sm-6.col-xs-6:nth-child(2) a::attr(href)').get()
     next_extract = response.css('div.col-md-6.col-sm-6.col-xs-6:nth-child(2) a').xpath("@href").extract()

     print(next_page)
     print(next_extract)


     if next_page is not None:
     #if next_extract is not 'kelimeler.php?s=2':
     #for i in range (10):
         yield response.follow(next_page, callback = self.parse)

Stack Overflow Asked by Slacoff on November 15, 2021

1 Answers

One Answer

I can't be very precise about the best approach without seeing the page, but I can giv you some suggestions.

     next_page = response.css('div.col-md-6.col-sm-6.col-xs-6:nth-child(2) a::attr(href)').get()
     next_extract = response.css('div.col-md-6.col-sm-6.col-xs-6:nth-child(2) a').xpath("@href").extract()

I'm not sure what you are trying to accomplish here, as both the selectors are essentially the same, except that the second one you are using the .extract() method, which returns a LIST. And since it returns a list this following line will ALWAYS fail:

    if next_extract is 'kelimeler.php?s=147':
        break

Another important point is that break is meant to be used inside a loop, so if the if statement ever resolved into True, this would cause an exception. Read more here.

Again, without seeing the page I can't say this for sure, but I believe this would acomplish what you are trying to do:

    if next_page == 'kelimeler.php?s=147':
         return

Notice next_page instead of next_extract. If you want to use the latter, remember it is a list, not a string.

Answered by renatodvc on November 15, 2021

Add your own answers!

Related Questions

How to annotate a value from a related model in Django

2  Asked on December 15, 2020 by roman-safonov

     

A network-related or instance-specific error with C#

1  Asked on December 15, 2020 by nathan-nguyen

 

Where are user deletion logs stored on SAP?

1  Asked on December 15, 2020 by jorge-valentini

   

How can I pass date/text to ?

2  Asked on December 15, 2020 by muska

   

Java Scanner useDelimiter() Method

1  Asked on December 14, 2020 by mnh

 

How to implement negative null floating point regex

2  Asked on December 14, 2020 by frontdev24

   

Cannot build Dockerfile

1  Asked on December 14, 2020 by user1765862

   

Efficiently Zip multiple list in python/pandas

3  Asked on December 14, 2020 by sunni

   

Pandas Excel groupby/count

1  Asked on December 14, 2020 by nathaniel

   

How to call a php function in javascript using Ajax?

2  Asked on December 14, 2020 by mr-skan

     

Angular class names based on item values

2  Asked on December 14, 2020 by mafortis

     

is there any way to give name object?

1  Asked on December 14, 2020 by beginner-coder

 

HTML hidden not submitting

2  Asked on December 14, 2020 by connor-gaymon

     

Ask a Question

Get help from others!

© 2021 InsideDarkWeb.com. All rights reserved.