[fix] engine & network issues / documentation and type annotations

This patch fixes some quirks and issues related to the engines and the network.
Each engine has its own network and this network was broken for the following
engines[1]:

- archlinux
- bing
- dailymotion
- duckduckgo
- google
- peertube
- startpage
- wikipedia

Since the files have been touched anyway, the type annotaions of the engine
modules has also been completed so that error messages from the type checker are
no longer reported.

Related and (partial) fixed issue:

- [1] https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
- [2] https://github.com/searxng/searxng/issues/2513
- [3] https://github.com/searxng/searxng/issues/2515

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This commit is contained in:
Markus Heiser 2023-06-25 12:37:31 +02:00 committed by Markus Heiser
parent 2e4a435134
commit e8706fb738
13 changed files with 204 additions and 122 deletions

View file

@ -397,14 +397,26 @@ Communication with search engines.
Global timeout of the requests made to others engines in seconds. A bigger Global timeout of the requests made to others engines in seconds. A bigger
timeout will allow to wait for answers from slow engines, but in consequence timeout will allow to wait for answers from slow engines, but in consequence
will slow SearXNG reactivity (the result page may take the time specified in the will slow SearXNG reactivity (the result page may take the time specified in the
timeout to load). Can be override by :ref:`settings engine` timeout to load). Can be override by ``timeout`` in the :ref:`settings engine`.
``useragent_suffix`` : ``useragent_suffix`` :
Suffix to the user-agent SearXNG uses to send requests to others engines. If an Suffix to the user-agent SearXNG uses to send requests to others engines. If an
engine wish to block you, a contact info here may be useful to avoid that. engine wish to block you, a contact info here may be useful to avoid that.
.. _Pool limit configuration: https://www.python-httpx.org/advanced/#pool-limit-configuration
``pool_maxsize``:
Number of allowable keep-alive connections, or ``null`` to always allow. The
default is 10. See ``max_keepalive_connections`` `Pool limit configuration`_.
``pool_connections`` :
Maximum number of allowable connections, or ``null`` # for no limits. The
default is 100. See ``max_connections`` `Pool limit configuration`_.
``keepalive_expiry`` : ``keepalive_expiry`` :
Number of seconds to keep a connection in the pool. By default 5.0 seconds. Number of seconds to keep a connection in the pool. By default 5.0 seconds.
See ``keepalive_expiry`` `Pool limit configuration`_.
.. _httpx proxies: https://www.python-httpx.org/advanced/#http-proxying .. _httpx proxies: https://www.python-httpx.org/advanced/#http-proxying
@ -429,15 +441,6 @@ Communication with search engines.
Number of retry in case of an HTTP error. On each retry, SearXNG uses an Number of retry in case of an HTTP error. On each retry, SearXNG uses an
different proxy and source ip. different proxy and source ip.
``retry_on_http_error`` :
Retry request on some HTTP status code.
Example:
* ``true`` : on HTTP status code between 400 and 599.
* ``403`` : on HTTP status code 403.
* ``[403, 429]``: on HTTP status code 403 and 429.
``enable_http2`` : ``enable_http2`` :
Enable by default. Set to ``false`` to disable HTTP/2. Enable by default. Set to ``false`` to disable HTTP/2.
@ -455,6 +458,11 @@ Communication with search engines.
``max_redirects`` : ``max_redirects`` :
30 by default. Maximum redirect before it is an error. 30 by default. Maximum redirect before it is an error.
``using_tor_proxy`` :
Using tor proxy (``true``) or not (``false``) for all engines. The default is
``false`` and can be overwritten in the :ref:`settings engine`
.. _settings categories_as_tabs: .. _settings categories_as_tabs:
@ -522,13 +530,14 @@ engine is shown. Most of the options have a default value or even are optional.
use_official_api: true use_official_api: true
require_api_key: true require_api_key: true
results: HTML results: HTML
enable_http: false
# overwrite values from section 'outgoing:'
enable_http2: false enable_http2: false
retries: 1 retries: 1
retry_on_http_error: true # or 403 or [404, 429]
max_connections: 100 max_connections: 100
max_keepalive_connections: 10 max_keepalive_connections: 10
keepalive_expiry: 5.0 keepalive_expiry: 5.0
using_tor_proxy: false
proxies: proxies:
http: http:
- http://proxy1:8080 - http://proxy1:8080
@ -539,6 +548,11 @@ engine is shown. Most of the options have a default value or even are optional.
- socks5://user:password@proxy3:1080 - socks5://user:password@proxy3:1080
- socks5h://user:password@proxy4:1080 - socks5h://user:password@proxy4:1080
# other network settings
enable_http: false
retry_on_http_error: true # or 403 or [404, 429]
``name`` : ``name`` :
Name that will be used across SearXNG to define this engine. In settings, on Name that will be used across SearXNG to define this engine. In settings, on
the result page... the result page...
@ -579,7 +593,8 @@ engine is shown. Most of the options have a default value or even are optional.
query all search engines in that category (group). query all search engines in that category (group).
``timeout`` : optional ``timeout`` : optional
Timeout of the search with the current search engine. **Be careful, it will Timeout of the search with the current search engine. Overwrites
``request_timeout`` from :ref:`settings outgoing`. **Be careful, it will
modify the global timeout of SearXNG.** modify the global timeout of SearXNG.**
``api_key`` : optional ``api_key`` : optional
@ -615,6 +630,37 @@ engine is shown. Most of the options have a default value or even are optional.
- ``ipv4`` set ``local_addresses`` to ``0.0.0.0`` (use only IPv4 local addresses) - ``ipv4`` set ``local_addresses`` to ``0.0.0.0`` (use only IPv4 local addresses)
- ``ipv6`` set ``local_addresses`` to ``::`` (use only IPv6 local addresses) - ``ipv6`` set ``local_addresses`` to ``::`` (use only IPv6 local addresses)
``enable_http`` : optional
Enable HTTP for this engine (by default only HTTPS is enabled).
``retry_on_http_error`` : optional
Retry request on some HTTP status code.
Example:
* ``true`` : on HTTP status code between 400 and 599.
* ``403`` : on HTTP status code 403.
* ``[403, 429]``: on HTTP status code 403 and 429.
``proxies`` :
Overwrites proxy settings from :ref:`settings outgoing`.
``using_tor_proxy`` :
Using tor proxy (``true``) or not (``false``) for this engine. The default is
taken from ``using_tor_proxy`` of the :ref:`settings outgoing`.
``max_keepalive_connection#s`` :
`Pool limit configuration`_, overwrites value ``pool_maxsize`` from
:ref:`settings outgoing` for this engine.
``max_connections`` :
`Pool limit configuration`_, overwrites value ``pool_connections`` from
:ref:`settings outgoing` for this engine.
``keepalive_expiry`` :
`Pool limit configuration`_, overwrites value ``keepalive_expiry`` from
:ref:`settings outgoing` for this engine.
.. note:: .. note::
A few more options are possible, but they are pretty specific to some A few more options are possible, but they are pretty specific to some

View file

@ -17,7 +17,7 @@
from __future__ import annotations from __future__ import annotations
from typing import Union, Dict, List, Callable, TYPE_CHECKING from typing import List, Callable, TYPE_CHECKING
if TYPE_CHECKING: if TYPE_CHECKING:
from searx.enginelib import traits from searx.enginelib import traits
@ -134,3 +134,15 @@ class Engine: # pylint: disable=too-few-public-methods
require_api_key: true require_api_key: true
results: HTML results: HTML
""" """
using_tor_proxy: bool
"""Using tor proxy (``true``) or not (``false``) for this engine."""
send_accept_language_header: bool
"""When this option is activated, the language (locale) that is selected by
the user is used to build and send a ``Accept-Language`` header in the
request to the origin search engine."""
tokens: List[str]
"""A list of secret tokens to make this engine *private*, more details see
:ref:`private engines`."""

View file

@ -13,6 +13,7 @@ used.
from __future__ import annotations from __future__ import annotations
import json import json
import dataclasses import dataclasses
import types
from typing import Dict, Iterable, Union, Callable, Optional, TYPE_CHECKING from typing import Dict, Iterable, Union, Callable, Optional, TYPE_CHECKING
from typing_extensions import Literal, Self from typing_extensions import Literal, Self
@ -82,8 +83,7 @@ class EngineTraits:
""" """
custom: Dict[str, Union[Dict[str, Dict], Iterable[str]]] = dataclasses.field(default_factory=dict) custom: Dict[str, Union[Dict[str, Dict], Iterable[str]]] = dataclasses.field(default_factory=dict)
"""A place to store engine's custom traits, not related to the SearXNG core """A place to store engine's custom traits, not related to the SearXNG core.
""" """
def get_language(self, searxng_locale: str, default=None): def get_language(self, searxng_locale: str, default=None):
@ -228,7 +228,7 @@ class EngineTraitsMap(Dict[str, EngineTraits]):
return obj return obj
def set_traits(self, engine: Engine): def set_traits(self, engine: Engine | types.ModuleType):
"""Set traits in a :py:obj:`Engine` namespace. """Set traits in a :py:obj:`Engine` namespace.
:param engine: engine instance build by :py:func:`searx.engines.load_engine` :param engine: engine instance build by :py:func:`searx.engines.load_engine`

View file

@ -17,7 +17,9 @@ import sys
import copy import copy
from os.path import realpath, dirname from os.path import realpath, dirname
from typing import TYPE_CHECKING, Dict, Optional from typing import TYPE_CHECKING, Dict
import types
import inspect
from searx import logger, settings from searx import logger, settings
from searx.utils import load_module from searx.utils import load_module
@ -28,21 +30,23 @@ if TYPE_CHECKING:
logger = logger.getChild('engines') logger = logger.getChild('engines')
ENGINE_DIR = dirname(realpath(__file__)) ENGINE_DIR = dirname(realpath(__file__))
ENGINE_DEFAULT_ARGS = { ENGINE_DEFAULT_ARGS = {
# Common options in the engine module
"engine_type": "online", "engine_type": "online",
"inactive": False,
"disabled": False,
"timeout": settings["outgoing"]["request_timeout"],
"shortcut": "-",
"categories": ["general"],
"paging": False, "paging": False,
"safesearch": False,
"time_range_support": False, "time_range_support": False,
"safesearch": False,
# settings.yml
"categories": ["general"],
"enable_http": False, "enable_http": False,
"using_tor_proxy": False, "shortcut": "-",
"timeout": settings["outgoing"]["request_timeout"],
"display_error_messages": True, "display_error_messages": True,
"disabled": False,
"inactive": False,
"about": {},
"using_tor_proxy": False,
"send_accept_language_header": False, "send_accept_language_header": False,
"tokens": [], "tokens": [],
"about": {},
} }
# set automatically when an engine does not have any tab category # set automatically when an engine does not have any tab category
DEFAULT_CATEGORY = 'other' DEFAULT_CATEGORY = 'other'
@ -51,7 +55,7 @@ DEFAULT_CATEGORY = 'other'
# Defaults for the namespace of an engine module, see :py:func:`load_engine` # Defaults for the namespace of an engine module, see :py:func:`load_engine`
categories = {'general': []} categories = {'general': []}
engines: Dict[str, Engine] = {} engines: Dict[str, Engine | types.ModuleType] = {}
engine_shortcuts = {} engine_shortcuts = {}
"""Simple map of registered *shortcuts* to name of the engine (or ``None``). """Simple map of registered *shortcuts* to name of the engine (or ``None``).
@ -63,7 +67,19 @@ engine_shortcuts = {}
""" """
def load_engine(engine_data: dict) -> Optional[Engine]: def check_engine_module(module: types.ModuleType):
# probe unintentional name collisions / for example name collisions caused
# by import statements in the engine module ..
# network: https://github.com/searxng/searxng/issues/762#issuecomment-1605323861
obj = getattr(module, 'network', None)
if obj and inspect.ismodule(obj):
msg = f'type of {module.__name__}.network is a module ({obj.__name__}), expected a string'
# logger.error(msg)
raise TypeError(msg)
def load_engine(engine_data: dict) -> Engine | types.ModuleType | None:
"""Load engine from ``engine_data``. """Load engine from ``engine_data``.
:param dict engine_data: Attributes from YAML ``settings:engines/<engine>`` :param dict engine_data: Attributes from YAML ``settings:engines/<engine>``
@ -100,19 +116,20 @@ def load_engine(engine_data: dict) -> Optional[Engine]:
engine_data['name'] = engine_name engine_data['name'] = engine_name
# load_module # load_module
engine_module = engine_data.get('engine') module_name = engine_data.get('engine')
if engine_module is None: if module_name is None:
logger.error('The "engine" field is missing for the engine named "{}"'.format(engine_name)) logger.error('The "engine" field is missing for the engine named "{}"'.format(engine_name))
return None return None
try: try:
engine = load_module(engine_module + '.py', ENGINE_DIR) engine = load_module(module_name + '.py', ENGINE_DIR)
except (SyntaxError, KeyboardInterrupt, SystemExit, SystemError, ImportError, RuntimeError): except (SyntaxError, KeyboardInterrupt, SystemExit, SystemError, ImportError, RuntimeError):
logger.exception('Fatal exception in engine "{}"'.format(engine_module)) logger.exception('Fatal exception in engine "{}"'.format(module_name))
sys.exit(1) sys.exit(1)
except BaseException: except BaseException:
logger.exception('Cannot load engine "{}"'.format(engine_module)) logger.exception('Cannot load engine "{}"'.format(module_name))
return None return None
check_engine_module(engine)
update_engine_attributes(engine, engine_data) update_engine_attributes(engine, engine_data)
update_attributes_for_tor(engine) update_attributes_for_tor(engine)
@ -153,18 +170,18 @@ def set_loggers(engine, engine_name):
and not hasattr(module, "logger") and not hasattr(module, "logger")
): ):
module_engine_name = module_name.split(".")[-1] module_engine_name = module_name.split(".")[-1]
module.logger = logger.getChild(module_engine_name) module.logger = logger.getChild(module_engine_name) # type: ignore
def update_engine_attributes(engine: Engine, engine_data): def update_engine_attributes(engine: Engine | types.ModuleType, engine_data):
# set engine attributes from engine_data # set engine attributes from engine_data
for param_name, param_value in engine_data.items(): for param_name, param_value in engine_data.items():
if param_name == 'categories': if param_name == 'categories':
if isinstance(param_value, str): if isinstance(param_value, str):
param_value = list(map(str.strip, param_value.split(','))) param_value = list(map(str.strip, param_value.split(',')))
engine.categories = param_value engine.categories = param_value # type: ignore
elif hasattr(engine, 'about') and param_name == 'about': elif hasattr(engine, 'about') and param_name == 'about':
engine.about = {**engine.about, **engine_data['about']} engine.about = {**engine.about, **engine_data['about']} # type: ignore
else: else:
setattr(engine, param_name, param_value) setattr(engine, param_name, param_value)
@ -174,10 +191,10 @@ def update_engine_attributes(engine: Engine, engine_data):
setattr(engine, arg_name, copy.deepcopy(arg_value)) setattr(engine, arg_name, copy.deepcopy(arg_value))
def update_attributes_for_tor(engine: Engine) -> bool: def update_attributes_for_tor(engine: Engine | types.ModuleType):
if using_tor_proxy(engine) and hasattr(engine, 'onion_url'): if using_tor_proxy(engine) and hasattr(engine, 'onion_url'):
engine.search_url = engine.onion_url + getattr(engine, 'search_path', '') engine.search_url = engine.onion_url + getattr(engine, 'search_path', '') # type: ignore
engine.timeout += settings['outgoing'].get('extra_proxy_timeout', 0) engine.timeout += settings['outgoing'].get('extra_proxy_timeout', 0) # type: ignore
def is_missing_required_attributes(engine): def is_missing_required_attributes(engine):
@ -193,12 +210,12 @@ def is_missing_required_attributes(engine):
return missing return missing
def using_tor_proxy(engine: Engine): def using_tor_proxy(engine: Engine | types.ModuleType):
"""Return True if the engine configuration declares to use Tor.""" """Return True if the engine configuration declares to use Tor."""
return settings['outgoing'].get('using_tor_proxy') or getattr(engine, 'using_tor_proxy', False) return settings['outgoing'].get('using_tor_proxy') or getattr(engine, 'using_tor_proxy', False)
def is_engine_active(engine: Engine): def is_engine_active(engine: Engine | types.ModuleType):
# check if engine is inactive # check if engine is inactive
if engine.inactive is True: if engine.inactive is True:
return False return False
@ -210,7 +227,7 @@ def is_engine_active(engine: Engine):
return True return True
def register_engine(engine: Engine): def register_engine(engine: Engine | types.ModuleType):
if engine.name in engines: if engine.name in engines:
logger.error('Engine config error: ambiguous name: {0}'.format(engine.name)) logger.error('Engine config error: ambiguous name: {0}'.format(engine.name))
sys.exit(1) sys.exit(1)

View file

@ -14,7 +14,6 @@ from urllib.parse import urlencode, urljoin, urlparse
import lxml import lxml
import babel import babel
from searx import network
from searx.utils import extract_text, eval_xpath_list, eval_xpath_getindex from searx.utils import extract_text, eval_xpath_list, eval_xpath_getindex
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
from searx.locales import language_tag from searx.locales import language_tag
@ -45,13 +44,13 @@ main_wiki = 'wiki.archlinux.org'
def request(query, params): def request(query, params):
sxng_lang = params['searxng_locale'].split('-')[0] sxng_lang = params['searxng_locale'].split('-')[0]
netloc = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki) netloc: str = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki) # type: ignore
title = traits.custom['title'].get(sxng_lang, 'Special:Search') title: str = traits.custom['title'].get(sxng_lang, 'Special:Search') # type: ignore
base_url = 'https://' + netloc + '/index.php?' base_url = 'https://' + netloc + '/index.php?'
offset = (params['pageno'] - 1) * 20 offset = (params['pageno'] - 1) * 20
if netloc == main_wiki: if netloc == main_wiki:
eng_lang: str = traits.get_language(sxng_lang, 'English') eng_lang: str = traits.get_language(sxng_lang, 'English') # type: ignore
query += ' (' + eng_lang + ')' query += ' (' + eng_lang + ')'
elif netloc == 'wiki.archlinuxcn.org': elif netloc == 'wiki.archlinuxcn.org':
base_url = 'https://' + netloc + '/wzh/index.php?' base_url = 'https://' + netloc + '/wzh/index.php?'
@ -71,11 +70,11 @@ def request(query, params):
def response(resp): def response(resp):
results = [] results = []
dom = lxml.html.fromstring(resp.text) dom = lxml.html.fromstring(resp.text) # type: ignore
# get the base URL for the language in which request was made # get the base URL for the language in which request was made
sxng_lang = resp.search_params['searxng_locale'].split('-')[0] sxng_lang = resp.search_params['searxng_locale'].split('-')[0]
netloc = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki) netloc: str = traits.custom['wiki_netloc'].get(sxng_lang, main_wiki) # type: ignore
base_url = 'https://' + netloc + '/index.php?' base_url = 'https://' + netloc + '/index.php?'
for result in eval_xpath_list(dom, '//ul[@class="mw-search-results"]/li'): for result in eval_xpath_list(dom, '//ul[@class="mw-search-results"]/li'):
@ -83,7 +82,7 @@ def response(resp):
content = extract_text(result.xpath('.//div[@class="searchresult"]')) content = extract_text(result.xpath('.//div[@class="searchresult"]'))
results.append( results.append(
{ {
'url': urljoin(base_url, link.get('href')), 'url': urljoin(base_url, link.get('href')), # type: ignore
'title': extract_text(link), 'title': extract_text(link),
'content': content, 'content': content,
} }
@ -114,6 +113,8 @@ def fetch_traits(engine_traits: EngineTraits):
}, },
""" """
# pylint: disable=import-outside-toplevel
from searx.network import get # see https://github.com/searxng/searxng/issues/762
engine_traits.custom['wiki_netloc'] = {} engine_traits.custom['wiki_netloc'] = {}
engine_traits.custom['title'] = {} engine_traits.custom['title'] = {}
@ -125,11 +126,11 @@ def fetch_traits(engine_traits: EngineTraits):
'zh': 'Special:搜索', 'zh': 'Special:搜索',
} }
resp = network.get('https://wiki.archlinux.org/') resp = get('https://wiki.archlinux.org/')
if not resp.ok: if not resp.ok: # type: ignore
print("ERROR: response from wiki.archlinix.org is not OK.") print("ERROR: response from wiki.archlinix.org is not OK.")
dom = lxml.html.fromstring(resp.text) dom = lxml.html.fromstring(resp.text) # type: ignore
for a in eval_xpath_list(dom, "//a[@class='interlanguage-link-target']"): for a in eval_xpath_list(dom, "//a[@class='interlanguage-link-target']"):
sxng_tag = language_tag(babel.Locale.parse(a.get('lang'), sep='-')) sxng_tag = language_tag(babel.Locale.parse(a.get('lang'), sep='-'))
@ -143,9 +144,9 @@ def fetch_traits(engine_traits: EngineTraits):
print("ERROR: title tag from %s (%s) is unknown" % (netloc, sxng_tag)) print("ERROR: title tag from %s (%s) is unknown" % (netloc, sxng_tag))
continue continue
engine_traits.custom['wiki_netloc'][sxng_tag] = netloc engine_traits.custom['wiki_netloc'][sxng_tag] = netloc
engine_traits.custom['title'][sxng_tag] = title engine_traits.custom['title'][sxng_tag] = title # type: ignore
eng_tag = extract_text(eval_xpath_list(a, ".//span")) eng_tag = extract_text(eval_xpath_list(a, ".//span"))
engine_traits.languages[sxng_tag] = eng_tag engine_traits.languages[sxng_tag] = eng_tag # type: ignore
engine_traits.languages['en'] = 'English' engine_traits.languages['en'] = 'English'

View file

@ -38,7 +38,6 @@ import babel
import babel.languages import babel.languages
from searx.utils import eval_xpath, extract_text, eval_xpath_list, eval_xpath_getindex from searx.utils import eval_xpath, extract_text, eval_xpath_list, eval_xpath_getindex
from searx import network
from searx.locales import language_tag, region_tag from searx.locales import language_tag, region_tag
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
@ -180,6 +179,10 @@ def request(query, params):
def response(resp): def response(resp):
# pylint: disable=too-many-locals,import-outside-toplevel
from searx.network import Request, multi_requests # see https://github.com/searxng/searxng/issues/762
results = [] results = []
result_len = 0 result_len = 0
@ -231,9 +234,9 @@ def response(resp):
# resolve all Bing redirections in parallel # resolve all Bing redirections in parallel
request_list = [ request_list = [
network.Request.get(u, allow_redirects=False, headers=resp.search_params['headers']) for u in url_to_resolve Request.get(u, allow_redirects=False, headers=resp.search_params['headers']) for u in url_to_resolve
] ]
response_list = network.multi_requests(request_list) response_list = multi_requests(request_list)
for i, redirect_response in enumerate(response_list): for i, redirect_response in enumerate(response_list):
if not isinstance(redirect_response, Exception): if not isinstance(redirect_response, Exception):
results[url_to_resolve_index[i]]['url'] = redirect_response.headers['location'] results[url_to_resolve_index[i]]['url'] = redirect_response.headers['location']
@ -272,16 +275,19 @@ def fetch_traits(engine_traits: EngineTraits):
def _fetch_traits(engine_traits: EngineTraits, url: str, xpath_language_codes: str, xpath_market_codes: str): def _fetch_traits(engine_traits: EngineTraits, url: str, xpath_language_codes: str, xpath_market_codes: str):
# pylint: disable=too-many-locals,import-outside-toplevel
from searx.network import get # see https://github.com/searxng/searxng/issues/762
# insert alias to map from a language (zh) to a language + script (zh_Hans) # insert alias to map from a language (zh) to a language + script (zh_Hans)
engine_traits.languages['zh'] = 'zh-hans' engine_traits.languages['zh'] = 'zh-hans'
resp = network.get(url) resp = get(url)
if not resp.ok: if not resp.ok: # type: ignore
print("ERROR: response from peertube is not OK.") print("ERROR: response from peertube is not OK.")
dom = html.fromstring(resp.text) dom = html.fromstring(resp.text) # type: ignore
map_lang = {'jp': 'ja'} map_lang = {'jp': 'ja'}
for td in eval_xpath(dom, xpath_language_codes): for td in eval_xpath(dom, xpath_language_codes):

View file

@ -18,9 +18,9 @@ from urllib.parse import urlencode
import time import time
import babel import babel
from searx.exceptions import SearxEngineAPIException from searx.network import get, raise_for_httperror # see https://github.com/searxng/searxng/issues/762
from searx import network
from searx.utils import html_to_text from searx.utils import html_to_text
from searx.exceptions import SearxEngineAPIException
from searx.locales import region_tag, language_tag from searx.locales import region_tag, language_tag
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
@ -106,7 +106,7 @@ def request(query, params):
if not query: if not query:
return False return False
eng_region = traits.get_region(params['searxng_locale'], 'en_US') eng_region: str = traits.get_region(params['searxng_locale'], 'en_US') # type: ignore
eng_lang = traits.get_language(params['searxng_locale'], 'en') eng_lang = traits.get_language(params['searxng_locale'], 'en')
args = { args = {
@ -156,7 +156,7 @@ def response(resp):
if 'error' in search_res: if 'error' in search_res:
raise SearxEngineAPIException(search_res['error'].get('message')) raise SearxEngineAPIException(search_res['error'].get('message'))
network.raise_for_httperror(resp) raise_for_httperror(resp)
# parse results # parse results
for res in search_res.get('list', []): for res in search_res.get('list', []):
@ -218,11 +218,11 @@ def fetch_traits(engine_traits: EngineTraits):
""" """
resp = network.get('https://api.dailymotion.com/locales') resp = get('https://api.dailymotion.com/locales')
if not resp.ok: if not resp.ok: # type: ignore
print("ERROR: response from dailymotion/locales is not OK.") print("ERROR: response from dailymotion/locales is not OK.")
for item in resp.json()['list']: for item in resp.json()['list']: # type: ignore
eng_tag = item['locale'] eng_tag = item['locale']
if eng_tag in ('en_EN', 'ar_AA'): if eng_tag in ('en_EN', 'ar_AA'):
continue continue
@ -241,11 +241,11 @@ def fetch_traits(engine_traits: EngineTraits):
locale_lang_list = [x.split('_')[0] for x in engine_traits.regions.values()] locale_lang_list = [x.split('_')[0] for x in engine_traits.regions.values()]
resp = network.get('https://api.dailymotion.com/languages') resp = get('https://api.dailymotion.com/languages')
if not resp.ok: if not resp.ok: # type: ignore
print("ERROR: response from dailymotion/languages is not OK.") print("ERROR: response from dailymotion/languages is not OK.")
for item in resp.json()['list']: for item in resp.json()['list']: # type: ignore
eng_tag = item['code'] eng_tag = item['code']
if eng_tag in locale_lang_list: if eng_tag in locale_lang_list:
sxng_tag = language_tag(babel.Locale.parse(eng_tag)) sxng_tag = language_tag(babel.Locale.parse(eng_tag))

View file

@ -13,17 +13,17 @@ import babel
import lxml.html import lxml.html
from searx import ( from searx import (
network,
locales, locales,
redislib, redislib,
external_bang, external_bang,
) )
from searx import redisdb
from searx.utils import ( from searx.utils import (
eval_xpath, eval_xpath,
eval_xpath_getindex, eval_xpath_getindex,
extract_text, extract_text,
) )
from searx.network import get # see https://github.com/searxng/searxng/issues/762
from searx import redisdb
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
from searx.exceptions import SearxEngineAPIException from searx.exceptions import SearxEngineAPIException
@ -95,8 +95,8 @@ def get_vqd(query, headers):
return value return value
query_url = 'https://duckduckgo.com/?q={query}&atb=v290-5'.format(query=urlencode({'q': query})) query_url = 'https://duckduckgo.com/?q={query}&atb=v290-5'.format(query=urlencode({'q': query}))
res = network.get(query_url, headers=headers) res = get(query_url, headers=headers)
content = res.text content = res.text # type: ignore
if content.find('vqd=\"') == -1: if content.find('vqd=\"') == -1:
raise SearxEngineAPIException('Request failed') raise SearxEngineAPIException('Request failed')
value = content[content.find('vqd=\"') + 5 :] value = content[content.find('vqd=\"') + 5 :]
@ -139,7 +139,9 @@ def get_ddg_lang(eng_traits: EngineTraits, sxng_locale, default='en_US'):
params['cookies']['kl'] = eng_region # 'ar-es' params['cookies']['kl'] = eng_region # 'ar-es'
""" """
return eng_traits.custom['lang_region'].get(sxng_locale, eng_traits.get_language(sxng_locale, default)) return eng_traits.custom['lang_region'].get( # type: ignore
sxng_locale, eng_traits.get_language(sxng_locale, default)
)
ddg_reg_map = { ddg_reg_map = {
@ -358,13 +360,13 @@ def fetch_traits(engine_traits: EngineTraits):
engine_traits.all_locale = 'wt-wt' engine_traits.all_locale = 'wt-wt'
# updated from u588 to u661 / should be updated automatically? # updated from u588 to u661 / should be updated automatically?
resp = network.get('https://duckduckgo.com/util/u661.js') resp = get('https://duckduckgo.com/util/u661.js')
if not resp.ok: if not resp.ok: # type: ignore
print("ERROR: response from DuckDuckGo is not OK.") print("ERROR: response from DuckDuckGo is not OK.")
pos = resp.text.find('regions:{') + 8 pos = resp.text.find('regions:{') + 8 # type: ignore
js_code = resp.text[pos:] js_code = resp.text[pos:] # type: ignore
pos = js_code.find('}') + 1 pos = js_code.find('}') + 1
regions = json.loads(js_code[:pos]) regions = json.loads(js_code[:pos])
@ -399,8 +401,8 @@ def fetch_traits(engine_traits: EngineTraits):
engine_traits.custom['lang_region'] = {} engine_traits.custom['lang_region'] = {}
pos = resp.text.find('languages:{') + 10 pos = resp.text.find('languages:{') + 10 # type: ignore
js_code = resp.text[pos:] js_code = resp.text[pos:] # type: ignore
pos = js_code.find('}') + 1 pos = js_code.find('}') + 1
js_code = '{"' + js_code[1:pos].replace(':', '":').replace(',', ',"') js_code = '{"' + js_code[1:pos].replace(':', '":').replace(',', ',"')
languages = json.loads(js_code) languages = json.loads(js_code)

View file

@ -23,7 +23,7 @@ import babel.languages
from searx.utils import extract_text, eval_xpath, eval_xpath_list, eval_xpath_getindex from searx.utils import extract_text, eval_xpath, eval_xpath_list, eval_xpath_getindex
from searx.locales import language_tag, region_tag, get_offical_locales from searx.locales import language_tag, region_tag, get_offical_locales
from searx import network from searx.network import get # see https://github.com/searxng/searxng/issues/762
from searx.exceptions import SearxEngineCaptchaException from searx.exceptions import SearxEngineCaptchaException
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
@ -419,11 +419,11 @@ def fetch_traits(engine_traits: EngineTraits, add_domains: bool = True):
engine_traits.custom['supported_domains'] = {} engine_traits.custom['supported_domains'] = {}
resp = network.get('https://www.google.com/preferences') resp = get('https://www.google.com/preferences')
if not resp.ok: if not resp.ok: # type: ignore
raise RuntimeError("Response from Google's preferences is not OK.") raise RuntimeError("Response from Google's preferences is not OK.")
dom = html.fromstring(resp.text) dom = html.fromstring(resp.text) # type: ignore
# supported language codes # supported language codes
@ -474,18 +474,18 @@ def fetch_traits(engine_traits: EngineTraits, add_domains: bool = True):
# supported domains # supported domains
if add_domains: if add_domains:
resp = network.get('https://www.google.com/supported_domains') resp = get('https://www.google.com/supported_domains')
if not resp.ok: if not resp.ok: # type: ignore
raise RuntimeError("Response from https://www.google.com/supported_domains is not OK.") raise RuntimeError("Response from https://www.google.com/supported_domains is not OK.")
for domain in resp.text.split(): for domain in resp.text.split(): # type: ignore
domain = domain.strip() domain = domain.strip()
if not domain or domain in [ if not domain or domain in [
'.google.com', '.google.com',
]: ]:
continue continue
region = domain.split('.')[-1].upper() region = domain.split('.')[-1].upper()
engine_traits.custom['supported_domains'][region] = 'www' + domain engine_traits.custom['supported_domains'][region] = 'www' + domain # type: ignore
if region == 'HK': if region == 'HK':
# There is no google.cn, we use .com.hk for zh-CN # There is no google.cn, we use .com.hk for zh-CN
engine_traits.custom['supported_domains']['CN'] = 'www' + domain engine_traits.custom['supported_domains']['CN'] = 'www' + domain # type: ignore

View file

@ -13,7 +13,7 @@ from dateutil.relativedelta import relativedelta
import babel import babel
from searx import network from searx.network import get # see https://github.com/searxng/searxng/issues/762
from searx.locales import language_tag from searx.locales import language_tag
from searx.utils import html_to_text from searx.utils import html_to_text
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
@ -147,32 +147,30 @@ def fetch_traits(engine_traits: EngineTraits):
https://framagit.org/framasoft/peertube/search-index/-/commit/8ed5c729#3d8747f9a60695c367c70bb64efba8f403721fad_0_291 https://framagit.org/framasoft/peertube/search-index/-/commit/8ed5c729#3d8747f9a60695c367c70bb64efba8f403721fad_0_291
""" """
resp = network.get( resp = get(
'https://framagit.org/framasoft/peertube/search-index/-/raw/master/client/src/components/Filters.vue', 'https://framagit.org/framasoft/peertube/search-index/-/raw/master/client/src/components/Filters.vue',
# the response from search-index repository is very slow # the response from search-index repository is very slow
timeout=60, timeout=60,
) )
if not resp.ok: if not resp.ok: # type: ignore
print("ERROR: response from peertube is not OK.") print("ERROR: response from peertube is not OK.")
return return
js_lang = re.search(r"videoLanguages \(\)[^\n]+(.*?)\]", resp.text, re.DOTALL) js_lang = re.search(r"videoLanguages \(\)[^\n]+(.*?)\]", resp.text, re.DOTALL) # type: ignore
if not js_lang: if not js_lang:
print("ERROR: can't determine languages from peertube") print("ERROR: can't determine languages from peertube")
return return
for lang in re.finditer(r"\{ id: '([a-z]+)', label:", js_lang.group(1)): for lang in re.finditer(r"\{ id: '([a-z]+)', label:", js_lang.group(1)):
eng_tag = lang.group(1)
if eng_tag == 'oc':
# Occitanis not known by babel, its closest relative is Catalan
# but 'ca' is already in the list of engine_traits.languages -->
# 'oc' will be ignored.
continue
try: try:
eng_tag = lang.group(1)
if eng_tag == 'oc':
# Occitanis not known by babel, its closest relative is Catalan
# but 'ca' is already in the list of engine_traits.languages -->
# 'oc' will be ignored.
continue
sxng_tag = language_tag(babel.Locale.parse(eng_tag)) sxng_tag = language_tag(babel.Locale.parse(eng_tag))
except babel.UnknownLocaleError: except babel.UnknownLocaleError:
print("ERROR: %s is unknown by babel" % eng_tag) print("ERROR: %s is unknown by babel" % eng_tag)
continue continue

View file

@ -91,8 +91,8 @@ import dateutil.parser
import lxml.html import lxml.html
import babel import babel
from searx import network
from searx.utils import extract_text, eval_xpath, gen_useragent from searx.utils import extract_text, eval_xpath, gen_useragent
from searx.network import get # see https://github.com/searxng/searxng/issues/762
from searx.exceptions import SearxEngineCaptchaException from searx.exceptions import SearxEngineCaptchaException
from searx.locales import region_tag from searx.locales import region_tag
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
@ -211,25 +211,25 @@ def get_sc_code(searxng_locale, params):
get_sc_url = base_url + '/?sc=%s' % (sc_code) get_sc_url = base_url + '/?sc=%s' % (sc_code)
logger.debug("query new sc time-stamp ... %s", get_sc_url) logger.debug("query new sc time-stamp ... %s", get_sc_url)
logger.debug("headers: %s", headers) logger.debug("headers: %s", headers)
resp = network.get(get_sc_url, headers=headers) resp = get(get_sc_url, headers=headers)
# ?? x = network.get('https://www.startpage.com/sp/cdn/images/filter-chevron.svg', headers=headers) # ?? x = network.get('https://www.startpage.com/sp/cdn/images/filter-chevron.svg', headers=headers)
# ?? https://www.startpage.com/sp/cdn/images/filter-chevron.svg # ?? https://www.startpage.com/sp/cdn/images/filter-chevron.svg
# ?? ping-back URL: https://www.startpage.com/sp/pb?sc=TLsB0oITjZ8F21 # ?? ping-back URL: https://www.startpage.com/sp/pb?sc=TLsB0oITjZ8F21
if str(resp.url).startswith('https://www.startpage.com/sp/captcha'): if str(resp.url).startswith('https://www.startpage.com/sp/captcha'): # type: ignore
raise SearxEngineCaptchaException( raise SearxEngineCaptchaException(
message="get_sc_code: got redirected to https://www.startpage.com/sp/captcha", message="get_sc_code: got redirected to https://www.startpage.com/sp/captcha",
) )
dom = lxml.html.fromstring(resp.text) dom = lxml.html.fromstring(resp.text) # type: ignore
try: try:
sc_code = eval_xpath(dom, search_form_xpath + '//input[@name="sc"]/@value')[0] sc_code = eval_xpath(dom, search_form_xpath + '//input[@name="sc"]/@value')[0]
except IndexError as exc: except IndexError as exc:
logger.debug("suspend startpage API --> https://github.com/searxng/searxng/pull/695") logger.debug("suspend startpage API --> https://github.com/searxng/searxng/pull/695")
raise SearxEngineCaptchaException( raise SearxEngineCaptchaException(
message="get_sc_code: [PR-695] query new sc time-stamp failed! (%s)" % resp.url, message="get_sc_code: [PR-695] query new sc time-stamp failed! (%s)" % resp.url, # type: ignore
) from exc ) from exc
sc_code_ts = time() sc_code_ts = time()
@ -350,7 +350,7 @@ def _response_cat_web(dom):
title = extract_text(link) title = extract_text(link)
if eval_xpath(result, content_xpath): if eval_xpath(result, content_xpath):
content = extract_text(eval_xpath(result, content_xpath)) content: str = extract_text(eval_xpath(result, content_xpath)) # type: ignore
else: else:
content = '' content = ''
@ -374,7 +374,7 @@ def _response_cat_web(dom):
date_string = content[0 : date_pos - 5] date_string = content[0 : date_pos - 5]
# calculate datetime # calculate datetime
published_date = datetime.now() - timedelta(days=int(re.match(r'\d+', date_string).group())) published_date = datetime.now() - timedelta(days=int(re.match(r'\d+', date_string).group())) # type: ignore
# fix content string # fix content string
content = content[date_pos:] content = content[date_pos:]
@ -399,12 +399,12 @@ def fetch_traits(engine_traits: EngineTraits):
'User-Agent': gen_useragent(), 'User-Agent': gen_useragent(),
'Accept-Language': "en-US,en;q=0.5", # bing needs to set the English language 'Accept-Language': "en-US,en;q=0.5", # bing needs to set the English language
} }
resp = network.get('https://www.startpage.com/do/settings', headers=headers) resp = get('https://www.startpage.com/do/settings', headers=headers)
if not resp.ok: if not resp.ok: # type: ignore
print("ERROR: response from Startpage is not OK.") print("ERROR: response from Startpage is not OK.")
dom = lxml.html.fromstring(resp.text) dom = lxml.html.fromstring(resp.text) # type: ignore
# regions # regions
@ -443,8 +443,10 @@ def fetch_traits(engine_traits: EngineTraits):
# get the native name of every language known by babel # get the native name of every language known by babel
for lang_code in filter(lambda lang_code: lang_code.find('_') == -1, babel.localedata.locale_identifiers()): for lang_code in filter(
native_name = babel.Locale(lang_code).get_language_name().lower() lambda lang_code: lang_code.find('_') == -1, babel.localedata.locale_identifiers() # type: ignore
):
native_name = babel.Locale(lang_code).get_language_name().lower() # type: ignore
# add native name exactly as it is # add native name exactly as it is
catalog_engine2code[native_name] = lang_code catalog_engine2code[native_name] = lang_code
@ -478,7 +480,7 @@ def fetch_traits(engine_traits: EngineTraits):
eng_tag = option.get('value') eng_tag = option.get('value')
if eng_tag in skip_eng_tags: if eng_tag in skip_eng_tags:
continue continue
name = extract_text(option).lower() name = extract_text(option).lower() # type: ignore
sxng_tag = catalog_engine2code.get(eng_tag) sxng_tag = catalog_engine2code.get(eng_tag)
if sxng_tag is None: if sxng_tag is None:

View file

@ -61,7 +61,7 @@ import babel
from lxml import html from lxml import html
from searx import utils from searx import utils
from searx import network from searx import network as _network
from searx import locales from searx import locales
from searx.enginelib.traits import EngineTraits from searx.enginelib.traits import EngineTraits
@ -180,7 +180,7 @@ def response(resp):
): ):
return [] return []
network.raise_for_httperror(resp) _network.raise_for_httperror(resp)
api_result = resp.json() api_result = resp.json()
title = utils.html_to_text(api_result.get('titles', {}).get('display') or api_result.get('title')) title = utils.html_to_text(api_result.get('titles', {}).get('display') or api_result.get('title'))
@ -267,7 +267,7 @@ def fetch_wikimedia_traits(engine_traits: EngineTraits):
for sxng_tag in sxng_tag_list: for sxng_tag in sxng_tag_list:
engine_traits.regions[sxng_tag] = eng_tag engine_traits.regions[sxng_tag] = eng_tag
resp = network.get(list_of_wikipedias) resp = _network.get(list_of_wikipedias)
if not resp.ok: if not resp.ok:
print("ERROR: response from Wikipedia is not OK.") print("ERROR: response from Wikipedia is not OK.")

View file

@ -209,9 +209,7 @@ SCHEMA = {
'enable_http2': SettingsValue(bool, True), 'enable_http2': SettingsValue(bool, True),
'verify': SettingsValue((bool, str), True), 'verify': SettingsValue((bool, str), True),
'max_request_timeout': SettingsValue((None, numbers.Real), None), 'max_request_timeout': SettingsValue((None, numbers.Real), None),
# Magic number kept from previous code
'pool_connections': SettingsValue(int, 100), 'pool_connections': SettingsValue(int, 100),
# Picked from constructor
'pool_maxsize': SettingsValue(int, 10), 'pool_maxsize': SettingsValue(int, 10),
'keepalive_expiry': SettingsValue(numbers.Real, 5.0), 'keepalive_expiry': SettingsValue(numbers.Real, 5.0),
# default maximum redirect # default maximum redirect