diff --git a/.mailmap b/.mailmap index 7fa9d41fbdaf945c807700300c1b3eb4289eb218..29ddeb1bf015c46f6316b6d4a436c94bdb9b3ae9 100644 --- a/.mailmap +++ b/.mailmap @@ -186,6 +186,9 @@ Uwe Kleine-König Uwe Kleine-König Uwe Kleine-König Valdis Kletnieks +Vinod Koul +Vinod Koul +Vinod Koul Viresh Kumar Viresh Kumar Viresh Kumar diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 708dc4c166e487c77aa6f6f4adf5189ab84f3f09..2754fe83f0d449623b7e0ac1fe4df5f061cdf773 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -64,8 +64,6 @@ auxdisplay/ - misc. LCD driver documentation (cfag12864b, ks0108). backlight/ - directory with info on controlling backlights in flat panel displays -bcache.txt - - Block-layer cache on fast SSDs to improve slow (raid) I/O performance. block/ - info on the Block I/O (BIO) layer. blockdev/ @@ -78,18 +76,10 @@ bus-devices/ - directory with info on TI GPMC (General Purpose Memory Controller) bus-virt-phys-mapping.txt - how to access I/O mapped memory from within device drivers. -cachetlb.txt - - describes the cache/TLB flushing interfaces Linux uses. cdrom/ - directory with information on the CD-ROM drivers that Linux has. cgroup-v1/ - cgroups v1 features, including cpusets and memory controller. -cgroup-v2.txt - - cgroups v2 features, including cpusets and memory controller. -circular-buffers.txt - - how to make use of the existing circular buffer infrastructure -clk.txt - - info on the common clock framework cma/ - Continuous Memory Area (CMA) debugfs interface. conf.py diff --git a/Documentation/ABI/obsolete/sysfs-gpio b/Documentation/ABI/obsolete/sysfs-gpio index 32513dc2eec9bdbf36ebc5117ece8f3577621e94..40d41ea1a3f52f515ac228c1cffa848deb6ede74 100644 --- a/Documentation/ABI/obsolete/sysfs-gpio +++ b/Documentation/ABI/obsolete/sysfs-gpio @@ -11,7 +11,7 @@ Description: Kernel code may export it for complete or partial access. GPIOs are identified as they are inside the kernel, using integers in - the range 0..INT_MAX. See Documentation/gpio/gpio.txt for more information. + the range 0..INT_MAX. See Documentation/gpio for more information. /sys/class/gpio /export ... asks the kernel to export a GPIO to userspace diff --git a/Documentation/ABI/removed/sysfs-bus-nfit b/Documentation/ABI/removed/sysfs-bus-nfit new file mode 100644 index 0000000000000000000000000000000000000000..ae8c1ca53828792274cc680bb01d15f8411edc79 --- /dev/null +++ b/Documentation/ABI/removed/sysfs-bus-nfit @@ -0,0 +1,17 @@ +What: /sys/bus/nd/devices/regionX/nfit/ecc_unit_size +Date: Aug, 2017 +KernelVersion: v4.14 (Removed v4.18) +Contact: linux-nvdimm@lists.01.org +Description: + (RO) Size of a write request to a DIMM that will not incur a + read-modify-write cycle at the memory controller. + + When the nfit driver initializes it runs an ARS (Address Range + Scrub) operation across every pmem range. Part of that process + involves determining the ARS capabilities of a given address + range. One of the capabilities that is reported is the 'Clear + Uncorrectable Error Range Length Unit Size' (see: ACPI 6.2 + section 9.20.7.4 Function Index 1 - Query ARS Capabilities). + This property indicates the boundary at which the NVDIMM may + need to perform read-modify-write cycles to maintain ECC (Error + Correcting Code) blocks. diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 5b2d0f08867cd899df072f89a059995944fb8eec..3e90e1f3bf0a004edbcd2d5b89eed1d05b7aa784 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -90,4 +90,4 @@ Date: December 2009 Contact: Lee Schermerhorn Description: The node's huge page size control/query attributes. - See Documentation/vm/hugetlbpage.txt \ No newline at end of file + See Documentation/admin-guide/mm/hugetlbpage.rst \ No newline at end of file diff --git a/Documentation/ABI/testing/evm b/Documentation/ABI/testing/evm index d12cb2eae9ee658e552c81c9009e2759a9142d58..201d10319fa18b588a42246814b13b67047a7dd3 100644 --- a/Documentation/ABI/testing/evm +++ b/Documentation/ABI/testing/evm @@ -57,3 +57,16 @@ Description: dracut (via 97masterkey and 98integrity) and systemd (via core/ima-setup) have support for loading keys at boot time. + +What: security/integrity/evm/evm_xattrs +Date: April 2018 +Contact: Matthew Garrett +Description: + Shows the set of extended attributes used to calculate or + validate the EVM signature, and allows additional attributes + to be added at runtime. Any signatures generated after + additional attributes are added (and on files posessing those + additional attributes) will only be valid if the same + additional attributes are configured on system boot. Writing + a single period (.) will lock the xattr list from any further + modification. diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy index b8465e00ba5fb7d06e434e9508302b71b1809f5b..74c6702de74e24369bc4d3e314d534898052ec2b 100644 --- a/Documentation/ABI/testing/ima_policy +++ b/Documentation/ABI/testing/ima_policy @@ -21,7 +21,7 @@ Description: audit | hash | dont_hash condition:= base | lsm [option] base: [[func=] [mask=] [fsmagic=] [fsuuid=] [uid=] - [euid=] [fowner=]] + [euid=] [fowner=] [fsname=]] lsm: [[subj_user=] [subj_role=] [subj_type=] [obj_user=] [obj_role=] [obj_type=]] option: [[appraise_type=]] [permit_directio] diff --git a/Documentation/ABI/testing/sysfs-bus-iio b/Documentation/ABI/testing/sysfs-bus-iio index 6a5f34b4d5b96d1a522ee4172b3712677e659060..731146c3b1384cc348e0a2289aaa9d2967ead5c1 100644 --- a/Documentation/ABI/testing/sysfs-bus-iio +++ b/Documentation/ABI/testing/sysfs-bus-iio @@ -190,6 +190,13 @@ Description: but should match other such assignments on device). Units after application of scale and offset are m/s^2. +What: /sys/bus/iio/devices/iio:deviceX/in_angl_raw +KernelVersion: 4.17 +Contact: linux-iio@vger.kernel.org +Description: + Angle of rotation. Units after application of scale and offset + are radians. + What: /sys/bus/iio/devices/iio:deviceX/in_anglvel_x_raw What: /sys/bus/iio/devices/iio:deviceX/in_anglvel_y_raw What: /sys/bus/iio/devices/iio:deviceX/in_anglvel_z_raw @@ -297,6 +304,7 @@ What: /sys/bus/iio/devices/iio:deviceX/in_pressure_offset What: /sys/bus/iio/devices/iio:deviceX/in_humidityrelative_offset What: /sys/bus/iio/devices/iio:deviceX/in_magn_offset What: /sys/bus/iio/devices/iio:deviceX/in_rot_offset +What: /sys/bus/iio/devices/iio:deviceX/in_angl_offset KernelVersion: 2.6.35 Contact: linux-iio@vger.kernel.org Description: @@ -350,6 +358,7 @@ What: /sys/bus/iio/devices/iio:deviceX/in_humidityrelative_scale What: /sys/bus/iio/devices/iio:deviceX/in_velocity_sqrt(x^2+y^2+z^2)_scale What: /sys/bus/iio/devices/iio:deviceX/in_illuminance_scale What: /sys/bus/iio/devices/iio:deviceX/in_countY_scale +What: /sys/bus/iio/devices/iio:deviceX/in_angl_scale KernelVersion: 2.6.35 Contact: linux-iio@vger.kernel.org Description: diff --git a/Documentation/ABI/testing/sysfs-bus-nfit b/Documentation/ABI/testing/sysfs-bus-nfit index 619eb8ca0f991a91625450e0afe4cd9d247f5ba1..a1cb44dcb90808201f58d18f9006a229a9be8296 100644 --- a/Documentation/ABI/testing/sysfs-bus-nfit +++ b/Documentation/ABI/testing/sysfs-bus-nfit @@ -212,22 +212,3 @@ Description: range. Used by NVDIMM Region Mapping Structure to uniquely refer to this structure. Value of 0 is reserved and not used as an index. - - -What: /sys/bus/nd/devices/regionX/nfit/ecc_unit_size -Date: Aug, 2017 -KernelVersion: v4.14 -Contact: linux-nvdimm@lists.01.org -Description: - (RO) Size of a write request to a DIMM that will not incur a - read-modify-write cycle at the memory controller. - - When the nfit driver initializes it runs an ARS (Address Range - Scrub) operation across every pmem range. Part of that process - involves determining the ARS capabilities of a given address - range. One of the capabilities that is reported is the 'Clear - Uncorrectable Error Range Length Unit Size' (see: ACPI 6.2 - section 9.20.7.4 Function Index 1 - Query ARS Capabilities). - This property indicates the boundary at which the NVDIMM may - need to perform read-modify-write cycles to maintain ECC (Error - Correcting Code) blocks. diff --git a/Documentation/ABI/testing/sysfs-bus-rpmsg b/Documentation/ABI/testing/sysfs-bus-rpmsg index 189e419a5a2d96543a2765eb1509c6d322941829..990fcc42093539b77ac3d4918d0c700a1be410c2 100644 --- a/Documentation/ABI/testing/sysfs-bus-rpmsg +++ b/Documentation/ABI/testing/sysfs-bus-rpmsg @@ -73,3 +73,23 @@ Description: This sysfs entry tells us whether the channel is a local server channel that is announced (values are either true or false). + +What: /sys/bus/rpmsg/devices/.../driver_override +Date: April 2018 +KernelVersion: 4.18 +Contact: Bjorn Andersson +Description: + Every rpmsg device is a communication channel with a remote + processor. Channels are identified by a textual name (see + /sys/bus/rpmsg/devices/.../name above) and have a local + ("source") rpmsg address, and remote ("destination") rpmsg + address. + + The listening entity (or client) which communicates with a + remote processor is referred as rpmsg driver. The rpmsg device + and rpmsg driver are matched based on rpmsg device name and + rpmsg driver ID table. + + This sysfs entry allows the rpmsg driver for a rpmsg device + to be specified which will override standard OF, ID table + and name matching. diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index c702c78f24d82448e7124704be6526b546cac24c..08d456e07b538122b6f0426b9b3cfc099fc525ce 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb @@ -189,6 +189,28 @@ Description: The file will read "hotplug", "wired" and "not used" if the information is available, and "unknown" otherwise. +What: /sys/bus/usb/devices/.../(hub interface)/portX/quirks +Date: May 2018 +Contact: Nicolas Boichat +Description: + In some cases, we care about time-to-active for devices + connected on a specific port (e.g. non-standard USB port like + pogo pins), where the device to be connected is known in + advance, and behaves well according to the specification. + This attribute is a bit-field that controls the behavior of + a specific port: + - Bit 0 of this field selects the "old" enumeration scheme, + as it is considerably faster (it only causes one USB reset + instead of 2). + The old enumeration scheme can also be selected globally + using /sys/module/usbcore/parameters/old_scheme_first, but + it is often not desirable as the new scheme was introduced to + increase compatibility with more devices. + - Bit 1 reduces TRSTRCY to the 10 ms that are required by the + USB 2.0 specification, instead of the 50 ms that are normally + used to help make enumeration work better on some high speed + devices. + What: /sys/bus/usb/devices/.../(hub interface)/portX/over_current_count Date: February 2018 Contact: Richard Leitner @@ -236,3 +258,21 @@ Description: Supported values are 0 - 15. More information on how besl values map to microseconds can be found in USB 2.0 ECN Errata for Link Power Management, section 4.10) + +What: /sys/bus/usb/devices/.../rx_lanes +Date: March 2018 +Contact: Mathias Nyman +Description: + Number of rx lanes the device is using. + USB 3.2 adds Dual-lane support, 2 rx and 2 tx lanes over Type-C. + Inter-Chip SSIC devices support asymmetric lanes up to 4 lanes per + direction. Devices before USB 3.2 are single lane (rx_lanes = 1) + +What: /sys/bus/usb/devices/.../tx_lanes +Date: March 2018 +Contact: Mathias Nyman +Description: + Number of tx lanes the device is using. + USB 3.2 adds Dual-lane support, 2 rx and 2 tx -lanes over Type-C. + Inter-Chip SSIC devices support asymmetric lanes up to 4 lanes per + direction. Devices before USB 3.2 are single lane (tx_lanes = 1) diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd index f34e592301d1dbd9190d8557e43fd05bcde08f80..3bc7c0a95c927896fdee10193108ba802d45c2e3 100644 --- a/Documentation/ABI/testing/sysfs-class-mtd +++ b/Documentation/ABI/testing/sysfs-class-mtd @@ -232,3 +232,11 @@ Description: of the parent (another partition or a flash device) in bytes. This attribute is absent on flash devices, so it can be used to distinguish them from partitions. + +What: /sys/class/mtd/mtdX/oobavail +Date: April 2018 +KernelVersion: 4.16 +Contact: linux-mtd@lists.infradead.org +Description: + Number of bytes available for a client to place data into + the out of band area. diff --git a/Documentation/ABI/testing/sysfs-class-power b/Documentation/ABI/testing/sysfs-class-power index f85ce9e327b98b94ab1f119f9ec15edf6a13ccfa..5e23e22dce1b514d133ac6371163da37fe5a7ed1 100644 --- a/Documentation/ABI/testing/sysfs-class-power +++ b/Documentation/ABI/testing/sysfs-class-power @@ -1,3 +1,458 @@ +===== General Properties ===== + +What: /sys/class/power_supply//manufacturer +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports the name of the device manufacturer. + + Access: Read + Valid values: Represented as string + +What: /sys/class/power_supply//model_name +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports the name of the device model. + + Access: Read + Valid values: Represented as string + +What: /sys/class/power_supply//serial_number +Date: January 2008 +Contact: linux-pm@vger.kernel.org +Description: + Reports the serial number of the device. + + Access: Read + Valid values: Represented as string + +What: /sys/class/power_supply//type +Date: May 2010 +Contact: linux-pm@vger.kernel.org +Description: + Describes the main type of the supply. + + Access: Read + Valid values: "Battery", "UPS", "Mains", "USB" + +===== Battery Properties ===== + +What: /sys/class/power_supply//capacity +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Fine grain representation of battery capacity. + Access: Read + Valid values: 0 - 100 (percent) + +What: /sys/class/power_supply//capacity_alert_max +Date: July 2012 +Contact: linux-pm@vger.kernel.org +Description: + Maximum battery capacity trip-wire value where the supply will + notify user-space of the event. This is normally used for the + battery discharging scenario where user-space needs to know the + battery has dropped to an upper level so it can take + appropriate action (e.g. warning user that battery level is + low). + + Access: Read, Write + Valid values: 0 - 100 (percent) + +What: /sys/class/power_supply//capacity_alert_min +Date: July 2012 +Contact: linux-pm@vger.kernel.org +Description: + Minimum battery capacity trip-wire value where the supply will + notify user-space of the event. This is normally used for the + battery discharging scenario where user-space needs to know the + battery has dropped to a lower level so it can take + appropriate action (e.g. warning user that battery level is + critically low). + + Access: Read, Write + Valid values: 0 - 100 (percent) + +What: /sys/class/power_supply//capacity_level +Date: June 2009 +Contact: linux-pm@vger.kernel.org +Description: + Coarse representation of battery capacity. + + Access: Read + Valid values: "Unknown", "Critical", "Low", "Normal", "High", + "Full" + +What: /sys/class/power_supply//current_avg +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports an average IBAT current reading for the battery, over a + fixed period. Normally devices will provide a fixed interval in + which they average readings to smooth out the reported value. + + Access: Read + Valid values: Represented in microamps + +What: /sys/class/power_supply//current_max +Date: October 2010 +Contact: linux-pm@vger.kernel.org +Description: + Reports the maximum IBAT current allowed into the battery. + + Access: Read + Valid values: Represented in microamps + +What: /sys/class/power_supply//current_now +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports an instant, single IBAT current reading for the battery. + This value is not averaged/smoothed. + + Access: Read + Valid values: Represented in microamps + +What: /sys/class/power_supply//charge_type +Date: July 2009 +Contact: linux-pm@vger.kernel.org +Description: + Represents the type of charging currently being applied to the + battery. + + Access: Read + Valid values: "Unknown", "N/A", "Trickle", "Fast" + +What: /sys/class/power_supply//charge_term_current +Date: July 2014 +Contact: linux-pm@vger.kernel.org +Description: + Reports the charging current value which is used to determine + when the battery is considered full and charging should end. + + Access: Read + Valid values: Represented in microamps + +What: /sys/class/power_supply//health +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports the health of the battery or battery side of charger + functionality. + + Access: Read + Valid values: "Unknown", "Good", "Overheat", "Dead", + "Over voltage", "Unspecified failure", "Cold", + "Watchdog timer expire", "Safety timer expire" + +What: /sys/class/power_supply//precharge_current +Date: June 2017 +Contact: linux-pm@vger.kernel.org +Description: + Reports the charging current applied during pre-charging phase + for a battery charge cycle. + + Access: Read + Valid values: Represented in microamps + +What: /sys/class/power_supply//present +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports whether a battery is present or not in the system. + + Access: Read + Valid values: + 0: Absent + 1: Present + +What: /sys/class/power_supply//status +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Represents the charging status of the battery. Normally this + is read-only reporting although for some supplies this can be + used to enable/disable charging to the battery. + + Access: Read, Write + Valid values: "Unknown", "Charging", "Discharging", + "Not charging", "Full" + +What: /sys/class/power_supply//technology +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Describes the battery technology supported by the supply. + + Access: Read + Valid values: "Unknown", "NiMH", "Li-ion", "Li-poly", "LiFe", + "NiCd", "LiMn" + +What: /sys/class/power_supply//temp +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports the current TBAT battery temperature reading. + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_alert_max +Date: July 2012 +Contact: linux-pm@vger.kernel.org +Description: + Maximum TBAT temperature trip-wire value where the supply will + notify user-space of the event. This is normally used for the + battery charging scenario where user-space needs to know the + battery temperature has crossed an upper threshold so it can + take appropriate action (e.g. warning user that battery level is + critically high, and charging has stopped). + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_alert_min +Date: July 2012 +Contact: linux-pm@vger.kernel.org +Description: + Minimum TBAT temperature trip-wire value where the supply will + notify user-space of the event. This is normally used for the + battery charging scenario where user-space needs to know the + battery temperature has crossed a lower threshold so it can take + appropriate action (e.g. warning user that battery level is + high, and charging current has been reduced accordingly to + remedy the situation). + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_max +Date: July 2014 +Contact: linux-pm@vger.kernel.org +Description: + Reports the maximum allowed TBAT battery temperature for + charging. + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_min +Date: July 2014 +Contact: linux-pm@vger.kernel.org +Description: + Reports the minimum allowed TBAT battery temperature for + charging. + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//voltage_avg, +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports an average VBAT voltage reading for the battery, over a + fixed period. Normally devices will provide a fixed interval in + which they average readings to smooth out the reported value. + + Access: Read + Valid values: Represented in microvolts + +What: /sys/class/power_supply//voltage_max, +Date: January 2008 +Contact: linux-pm@vger.kernel.org +Description: + Reports the maximum safe VBAT voltage permitted for the battery, + during charging. + + Access: Read + Valid values: Represented in microvolts + +What: /sys/class/power_supply//voltage_min, +Date: January 2008 +Contact: linux-pm@vger.kernel.org +Description: + Reports the minimum safe VBAT voltage permitted for the battery, + during discharging. + + Access: Read + Valid values: Represented in microvolts + +What: /sys/class/power_supply//voltage_now, +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports an instant, single VBAT voltage reading for the battery. + This value is not averaged/smoothed. + + Access: Read + Valid values: Represented in microvolts + +===== USB Properties ===== + +What: /sys/class/power_supply//current_avg +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports an average IBUS current reading over a fixed period. + Normally devices will provide a fixed interval in which they + average readings to smooth out the reported value. + + Access: Read + Valid values: Represented in microamps + + +What: /sys/class/power_supply//current_max +Date: October 2010 +Contact: linux-pm@vger.kernel.org +Description: + Reports the maximum IBUS current the supply can support. + + Access: Read + Valid values: Represented in microamps + +What: /sys/class/power_supply//current_now +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports the IBUS current supplied now. This value is generally + read-only reporting, unless the 'online' state of the supply + is set to be programmable, in which case this value can be set + within the reported min/max range. + + Access: Read, Write + Valid values: Represented in microamps + +What: /sys/class/power_supply//input_current_limit +Date: July 2014 +Contact: linux-pm@vger.kernel.org +Description: + Details the incoming IBUS current limit currently set in the + supply. Normally this is configured based on the type of + connection made (e.g. A configured SDP should output a maximum + of 500mA so the input current limit is set to the same value). + + Access: Read, Write + Valid values: Represented in microamps + +What: /sys/class/power_supply//online, +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Indicates if VBUS is present for the supply. When the supply is + online, and the supply allows it, then it's possible to switch + between online states (e.g. Fixed -> Programmable for a PD_PPS + USB supply so voltage and current can be controlled). + + Access: Read, Write + Valid values: + 0: Offline + 1: Online Fixed - Fixed Voltage Supply + 2: Online Programmable - Programmable Voltage Supply + +What: /sys/class/power_supply//temp +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports the current supply temperature reading. This would + normally be the internal temperature of the device itself (e.g + TJUNC temperature of an IC) + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_alert_max +Date: July 2012 +Contact: linux-pm@vger.kernel.org +Description: + Maximum supply temperature trip-wire value where the supply will + notify user-space of the event. This is normally used for the + charging scenario where user-space needs to know the supply + temperature has crossed an upper threshold so it can take + appropriate action (e.g. warning user that the supply + temperature is critically high, and charging has stopped to + remedy the situation). + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_alert_min +Date: July 2012 +Contact: linux-pm@vger.kernel.org +Description: + Minimum supply temperature trip-wire value where the supply will + notify user-space of the event. This is normally used for the + charging scenario where user-space needs to know the supply + temperature has crossed a lower threshold so it can take + appropriate action (e.g. warning user that the supply + temperature is high, and charging current has been reduced + accordingly to remedy the situation). + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_max +Date: July 2014 +Contact: linux-pm@vger.kernel.org +Description: + Reports the maximum allowed supply temperature for operation. + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//temp_min +Date: July 2014 +Contact: linux-pm@vger.kernel.org +Description: + Reports the mainimum allowed supply temperature for operation. + + Access: Read + Valid values: Represented in 1/10 Degrees Celsius + +What: /sys/class/power_supply//usb_type +Date: March 2018 +Contact: linux-pm@vger.kernel.org +Description: + Reports what type of USB connection is currently active for + the supply, for example it can show if USB-PD capable source + is attached. + + Access: Read-Only + Valid values: "Unknown", "SDP", "DCP", "CDP", "ACA", "C", "PD", + "PD_DRP", "PD_PPS", "BrickID" + +What: /sys/class/power_supply//voltage_max +Date: January 2008 +Contact: linux-pm@vger.kernel.org +Description: + Reports the maximum VBUS voltage the supply can support. + + Access: Read + Valid values: Represented in microvolts + +What: /sys/class/power_supply//voltage_min +Date: January 2008 +Contact: linux-pm@vger.kernel.org +Description: + Reports the minimum VBUS voltage the supply can support. + + Access: Read + Valid values: Represented in microvolts + +What: /sys/class/power_supply//voltage_now +Date: May 2007 +Contact: linux-pm@vger.kernel.org +Description: + Reports the VBUS voltage supplied now. This value is generally + read-only reporting, unless the 'online' state of the supply + is set to be programmable, in which case this value can be set + within the reported min/max range. + + Access: Read, Write + Valid values: Represented in microvolts + +===== Device Specific Properties ===== + What: /sys/class/power/ds2760-battery.*/charge_now Date: May 2010 KernelVersion: 2.6.35 diff --git a/Documentation/ABI/testing/sysfs-class-rc b/Documentation/ABI/testing/sysfs-class-rc index 8be1fd3760e038738d5a78cf04db330bc9b86b14..6c0d6c8cb91105779324b78efe4d3c12bb191316 100644 --- a/Documentation/ABI/testing/sysfs-class-rc +++ b/Documentation/ABI/testing/sysfs-class-rc @@ -1,7 +1,7 @@ What: /sys/class/rc/ Date: Apr 2010 KernelVersion: 2.6.35 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: The rc/ class sub-directory belongs to the Remote Controller core and provides a sysfs interface for configuring infrared @@ -10,7 +10,7 @@ Description: What: /sys/class/rc/rcN/ Date: Apr 2010 KernelVersion: 2.6.35 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: A /sys/class/rc/rcN directory is created for each remote control receiver device where N is the number of the receiver. @@ -18,7 +18,7 @@ Description: What: /sys/class/rc/rcN/protocols Date: Jun 2010 KernelVersion: 2.6.36 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: Reading this file returns a list of available protocols, something like: @@ -36,7 +36,7 @@ Description: What: /sys/class/rc/rcN/filter Date: Jan 2014 KernelVersion: 3.15 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: Sets the scancode filter expected value. Use in combination with /sys/class/rc/rcN/filter_mask to set the @@ -49,7 +49,7 @@ Description: What: /sys/class/rc/rcN/filter_mask Date: Jan 2014 KernelVersion: 3.15 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: Sets the scancode filter mask of bits to compare. Use in combination with /sys/class/rc/rcN/filter to set the bits @@ -64,7 +64,7 @@ Description: What: /sys/class/rc/rcN/wakeup_protocols Date: Feb 2017 KernelVersion: 4.11 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: Reading this file returns a list of available protocols to use for the wakeup filter, something like: @@ -83,7 +83,7 @@ Description: What: /sys/class/rc/rcN/wakeup_filter Date: Jan 2014 KernelVersion: 3.15 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: Sets the scancode wakeup filter expected value. Use in combination with /sys/class/rc/rcN/wakeup_filter_mask to @@ -98,7 +98,7 @@ Description: What: /sys/class/rc/rcN/wakeup_filter_mask Date: Jan 2014 KernelVersion: 3.15 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: Sets the scancode wakeup filter mask of bits to compare. Use in combination with /sys/class/rc/rcN/wakeup_filter to set diff --git a/Documentation/ABI/testing/sysfs-class-rc-nuvoton b/Documentation/ABI/testing/sysfs-class-rc-nuvoton index 905bcdeedef26d214fe8e967eab1e936f76581fc..d3abe45f8690458552cdd2d07dfbde7374dfe1aa 100644 --- a/Documentation/ABI/testing/sysfs-class-rc-nuvoton +++ b/Documentation/ABI/testing/sysfs-class-rc-nuvoton @@ -1,7 +1,7 @@ What: /sys/class/rc/rcN/wakeup_data Date: Mar 2016 KernelVersion: 4.6 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab Description: Reading this file returns the stored CIR wakeup sequence. It starts with a pulse, followed by a space, pulse etc. diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac index 46ff929fd52a317808e1270d7c4a72de7a99970c..256a9e990c0b06ce070c623f35c4a7ce3bf8d083 100644 --- a/Documentation/ABI/testing/sysfs-devices-edac +++ b/Documentation/ABI/testing/sysfs-devices-edac @@ -77,7 +77,7 @@ Description: Read/Write attribute file that controls memory scrubbing. What: /sys/devices/system/edac/mc/mc*/max_location Date: April 2012 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab linux-edac@vger.kernel.org Description: This attribute file displays the information about the last available memory slot in this memory controller. It is used by @@ -85,7 +85,7 @@ Description: This attribute file displays the information about the last What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/size Date: April 2012 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab linux-edac@vger.kernel.org Description: This attribute file will display the size of dimm or rank. For dimm*/size, this is the size, in MB of the DIMM memory @@ -96,14 +96,14 @@ Description: This attribute file will display the size of dimm or rank. What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_dev_type Date: April 2012 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab linux-edac@vger.kernel.org Description: This attribute file will display what type of DRAM device is being utilized on this DIMM (x1, x2, x4, x8, ...). What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_edac_mode Date: April 2012 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab linux-edac@vger.kernel.org Description: This attribute file will display what type of Error detection and correction is being utilized. For example: S4ECD4ED would @@ -111,7 +111,7 @@ Description: This attribute file will display what type of Error detection What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_label Date: April 2012 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab linux-edac@vger.kernel.org Description: This control file allows this DIMM to have a label assigned to it. With this label in the module, when errors occur @@ -126,14 +126,14 @@ Description: This control file allows this DIMM to have a label assigned What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_location Date: April 2012 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab linux-edac@vger.kernel.org Description: This attribute file will display the location (csrow/channel, branch/channel/slot or channel/slot) of the dimm or rank. What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_mem_type Date: April 2012 -Contact: Mauro Carvalho Chehab +Contact: Mauro Carvalho Chehab linux-edac@vger.kernel.org Description: This attribute file will display what type of memory is currently on this csrow. Normally, either buffered or diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 6048a81fa744ed9ab0262a63fdc40e5b5fb6e71f..73318225a3681b3473abe87d0fe1bb4c3447e7ed 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -238,9 +238,6 @@ Description: Discover and change clock speed of CPUs See files in Documentation/cpu-freq/ for more information. - In particular, read Documentation/cpu-freq/user-guide.txt - to learn how to control the knobs. - What: /sys/devices/system/cpu/cpu#/cpufreq/freqdomain_cpus Date: June 2013 diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 540553c933b6197e51cd81367ffbc16fbd24273d..9b0123388f182dfe5829205a73bdf5a63bd32e10 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -101,6 +101,7 @@ Date: February 2015 Contact: "Jaegeuk Kim" Description: Controls the trimming rate in batch mode. + What: /sys/fs/f2fs//cp_interval Date: October 2015 @@ -140,7 +141,7 @@ Contact: "Shuoran Liu" Description: Shows total written kbytes issued to disk. -What: /sys/fs/f2fs//feature +What: /sys/fs/f2fs//features Date: July 2017 Contact: "Jaegeuk Kim" Description: diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-hugepages b/Documentation/ABI/testing/sysfs-kernel-mm-hugepages index e21c00571cf4f082b04f12b168e376c500bd845d..fdaa2162fae1528529b7045c6e9184ae9620bc01 100644 --- a/Documentation/ABI/testing/sysfs-kernel-mm-hugepages +++ b/Documentation/ABI/testing/sysfs-kernel-mm-hugepages @@ -12,4 +12,4 @@ Description: free_hugepages surplus_hugepages resv_hugepages - See Documentation/vm/hugetlbpage.txt for details. + See Documentation/admin-guide/mm/hugetlbpage.rst for details. diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-ksm b/Documentation/ABI/testing/sysfs-kernel-mm-ksm index 73e653ee248160cb5612d620eeeeff5130f07bc6..dfc13244cda3bb567591fbd6cbcfaee6905731c6 100644 --- a/Documentation/ABI/testing/sysfs-kernel-mm-ksm +++ b/Documentation/ABI/testing/sysfs-kernel-mm-ksm @@ -40,7 +40,7 @@ Description: Kernel Samepage Merging daemon sysfs interface sleep_millisecs: how many milliseconds ksm should sleep between scans. - See Documentation/vm/ksm.txt for more information. + See Documentation/vm/ksm.rst for more information. What: /sys/kernel/mm/ksm/merge_across_nodes Date: January 2013 diff --git a/Documentation/ABI/testing/sysfs-kernel-slab b/Documentation/ABI/testing/sysfs-kernel-slab index 2cc0a72b64be68cd5a4191dab5581a71780407e4..29601d93a1c2ea112899007c37a2f7297c655338 100644 --- a/Documentation/ABI/testing/sysfs-kernel-slab +++ b/Documentation/ABI/testing/sysfs-kernel-slab @@ -37,7 +37,7 @@ Description: The alloc_calls file is read-only and lists the kernel code locations from which allocations for this cache were performed. The alloc_calls file only contains information if debugging is - enabled for that cache (see Documentation/vm/slub.txt). + enabled for that cache (see Documentation/vm/slub.rst). What: /sys/kernel/slab/cache/alloc_fastpath Date: February 2008 @@ -219,7 +219,7 @@ Contact: Pekka Enberg , Description: The free_calls file is read-only and lists the locations of object frees if slab debugging is enabled (see - Documentation/vm/slub.txt). + Documentation/vm/slub.rst). What: /sys/kernel/slab/cache/free_fastpath Date: February 2008 diff --git a/Documentation/ABI/testing/sysfs-platform-ideapad-laptop b/Documentation/ABI/testing/sysfs-platform-ideapad-laptop index 597a2f3d1efceb7383c85dda040f4b1def8adddf..1b31be3f996a40df66299e4715ee4eb984ec12e1 100644 --- a/Documentation/ABI/testing/sysfs-platform-ideapad-laptop +++ b/Documentation/ABI/testing/sysfs-platform-ideapad-laptop @@ -25,3 +25,16 @@ Description: Control touchpad mode. * 1 -> Switched On * 0 -> Switched Off + +What: /sys/bus/pci/devices///VPC2004:00/fn_lock +Date: May 2018 +KernelVersion: 4.18 +Contact: "Oleg Keri " +Description: + Control fn-lock mode. + * 1 -> Switched On + * 0 -> Switched Off + + For example: + # echo "0" > \ + /sys/bus/pci/devices/0000:00:1f.0/PNP0C09:00/VPC2004:00/fn_lock diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt index 0b6bb3ef449ee7a8ee45afe9b4be61a2859ac8e3..688b69121e8294ca81d47689c25345e97d5de720 100644 --- a/Documentation/PCI/pci-error-recovery.txt +++ b/Documentation/PCI/pci-error-recovery.txt @@ -110,7 +110,7 @@ The actual steps taken by a platform to recover from a PCI error event will be platform-dependent, but will follow the general sequence described below. -STEP 0: Error Event +STEP 0: Error Event: ERR_NONFATAL ------------------- A PCI bus error is detected by the PCI hardware. On powerpc, the slot is isolated, in that all I/O is blocked: all reads return 0xffffffff, @@ -228,13 +228,7 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform proceeds to STEP 4 (Slot Reset) -STEP 3: Link Reset ------------------- -The platform resets the link. This is a PCI-Express specific step -and is done whenever a fatal error has been detected that can be -"solved" by resetting the link. - -STEP 4: Slot Reset +STEP 3: Slot Reset ------------------ In response to a return value of PCI_ERS_RESULT_NEED_RESET, the @@ -320,7 +314,7 @@ Failure). >>> However, it probably should. -STEP 5: Resume Operations +STEP 4: Resume Operations ------------------------- The platform will call the resume() callback on all affected device drivers if all drivers on the segment have returned @@ -332,7 +326,7 @@ a result code. At this point, if a new error happens, the platform will restart a new error recovery sequence. -STEP 6: Permanent Failure +STEP 5: Permanent Failure ------------------------- A "permanent failure" has occurred, and the platform cannot recover the device. The platform will call error_detected() with a @@ -355,6 +349,27 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt for additional detail on real-life experience of the causes of software errors. +STEP 0: Error Event: ERR_FATAL +------------------- +PCI bus error is detected by the PCI hardware. On powerpc, the slot is +isolated, in that all I/O is blocked: all reads return 0xffffffff, all +writes are ignored. + +STEP 1: Remove devices +-------------------- +Platform removes the devices depending on the error agent, it could be +this port for all subordinates or upstream component (likely downstream +port) + +STEP 2: Reset link +-------------------- +The platform resets the link. This is a PCI-Express specific step and is +done whenever a fatal error has been detected that can be "solved" by +resetting the link. + +STEP 3: Re-enumerate the devices +-------------------- +Initiates the re-enumeration. Conclusion; General Remarks --------------------------- diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index a27fbfb0efb823646fd49b27574a24adb55369ab..65eb856526b7c308b08088a31957d70baa21b5a7 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -1,3 +1,5 @@ +What is RCU? -- "Read, Copy, Update" + Please note that the "What is RCU?" LWN series is an excellent place to start learning about RCU: diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst index ddcc58d01cfbce99a2da80a91101213203b83e82..14cefc020e2d5e3ffb16e5774bf58c7191b3ea13 100644 --- a/Documentation/accelerators/ocxl.rst +++ b/Documentation/accelerators/ocxl.rst @@ -157,6 +157,17 @@ OCXL_IOCTL_GET_METADATA: Obtains configuration information from the card, such at the size of MMIO areas, the AFU version, and the PASID for the current context. +OCXL_IOCTL_ENABLE_P9_WAIT: + + Allows the AFU to wake a userspace thread executing 'wait'. Returns + information to userspace to allow it to configure the AFU. Note that + this is only available on POWER9. + +OCXL_IOCTL_GET_FEATURES: + + Reports on which CPU features that affect OpenCAPI are usable from + userspace. + mmap ---- diff --git a/Documentation/acpi/cppc_sysfs.txt b/Documentation/acpi/cppc_sysfs.txt new file mode 100644 index 0000000000000000000000000000000000000000..f20fb445135d31953db86edccacca29c519fbf69 --- /dev/null +++ b/Documentation/acpi/cppc_sysfs.txt @@ -0,0 +1,69 @@ + + Collaborative Processor Performance Control (CPPC) + +CPPC defined in the ACPI spec describes a mechanism for the OS to manage the +performance of a logical processor on a contigious and abstract performance +scale. CPPC exposes a set of registers to describe abstract performance scale, +to request performance levels and to measure per-cpu delivered performance. + +For more details on CPPC please refer to the ACPI specification at: + +http://uefi.org/specifications + +Some of the CPPC registers are exposed via sysfs under: + +/sys/devices/system/cpu/cpuX/acpi_cppc/ + +for each cpu X + +-------------------------------------------------------------------------------- + +$ ls -lR /sys/devices/system/cpu/cpu0/acpi_cppc/ +/sys/devices/system/cpu/cpu0/acpi_cppc/: +total 0 +-r--r--r-- 1 root root 65536 Mar 5 19:38 feedback_ctrs +-r--r--r-- 1 root root 65536 Mar 5 19:38 highest_perf +-r--r--r-- 1 root root 65536 Mar 5 19:38 lowest_freq +-r--r--r-- 1 root root 65536 Mar 5 19:38 lowest_nonlinear_perf +-r--r--r-- 1 root root 65536 Mar 5 19:38 lowest_perf +-r--r--r-- 1 root root 65536 Mar 5 19:38 nominal_freq +-r--r--r-- 1 root root 65536 Mar 5 19:38 nominal_perf +-r--r--r-- 1 root root 65536 Mar 5 19:38 reference_perf +-r--r--r-- 1 root root 65536 Mar 5 19:38 wraparound_time + +-------------------------------------------------------------------------------- + +* highest_perf : Highest performance of this processor (abstract scale). +* nominal_perf : Highest sustained performance of this processor (abstract scale). +* lowest_nonlinear_perf : Lowest performance of this processor with nonlinear + power savings (abstract scale). +* lowest_perf : Lowest performance of this processor (abstract scale). + +* lowest_freq : CPU frequency corresponding to lowest_perf (in MHz). +* nominal_freq : CPU frequency corresponding to nominal_perf (in MHz). + The above frequencies should only be used to report processor performance in + freqency instead of abstract scale. These values should not be used for any + functional decisions. + +* feedback_ctrs : Includes both Reference and delivered performance counter. + Reference counter ticks up proportional to processor's reference performance. + Delivered counter ticks up proportional to processor's delivered performance. +* wraparound_time: Minimum time for the feedback counters to wraparound (seconds). +* reference_perf : Performance level at which reference performance counter + accumulates (abstract scale). + +-------------------------------------------------------------------------------- + + Computing Average Delivered Performance + +Below describes the steps to compute the average performance delivered by taking +two different snapshots of feedback counters at time T1 and T2. + +T1: Read feedback_ctrs as fbc_t1 + Wait or run some workload +T2: Read feedback_ctrs as fbc_t2 + +delivered_counter_delta = fbc_t2[del] - fbc_t1[del] +reference_counter_delta = fbc_t2[ref] - fbc_t1[ref] + +delivered_perf = (refernce_perf x delivered_counter_delta) / reference_counter_delta diff --git a/Documentation/acpi/method-customizing.txt b/Documentation/acpi/method-customizing.txt index a3f598e141f2eca4e844f868e9586dab657ec59d..7235da975f23b4e916353710b6480ec65716175f 100644 --- a/Documentation/acpi/method-customizing.txt +++ b/Documentation/acpi/method-customizing.txt @@ -16,7 +16,8 @@ control method rather than override the entire DSDT, because kernel rebuild/reboot is not needed and test result can be got in minutes. Note: Only ACPI METHOD can be overridden, any other object types like - "Device", "OperationRegion", are not recognized. + "Device", "OperationRegion", are not recognized. Methods + declared inside scope operators are also not supported. Note: The same ACPI control method can be overridden for many times, and it's always the latest one that used by Linux/kernel. Note: To get the ACPI debug object output (Store (AAAA, Debug)), @@ -32,8 +33,6 @@ Note: To get the ACPI debug object output (Store (AAAA, Debug)), DefinitionBlock ("", "SSDT", 1, "", "", 0x20080715) { - External (ACON) - Method (\_SB_.AC._PSR, 0, NotSerialized) { Store ("In AC _PSR", Debug) @@ -42,9 +41,10 @@ Note: To get the ACPI debug object output (Store (AAAA, Debug)), } Note that the full pathname of the method in ACPI namespace should be used. - And remember to use "External" to declare external objects. e) assemble the file to generate the AML code of the method. - e.g. "iasl psr.asl" (psr.aml is generated as a result) + e.g. "iasl -vw 6084 psr.asl" (psr.aml is generated as a result) + If parameter "-vw 6084" is not supported by your iASL compiler, + please try a newer version. f) mount debugfs by "mount -t debugfs none /sys/kernel/debug" g) override the old method via the debugfs by running "cat /tmp/psr.aml > /sys/kernel/debug/acpi/custom_method" diff --git a/Documentation/admin-guide/LSM/apparmor.rst b/Documentation/admin-guide/LSM/apparmor.rst index 3e9734bd0e0586bb737e65c45bd2e6d6fbf0e383..6cf81bbd7ce8b86bfcc740abe4e0bde49e239700 100644 --- a/Documentation/admin-guide/LSM/apparmor.rst +++ b/Documentation/admin-guide/LSM/apparmor.rst @@ -44,8 +44,8 @@ Links Mailing List - apparmor@lists.ubuntu.com -Wiki - http://apparmor.wiki.kernel.org/ +Wiki - http://wiki.apparmor.net -User space tools - https://launchpad.net/apparmor +User space tools - https://gitlab.com/apparmor -Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git +Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor diff --git a/Documentation/bcache.txt b/Documentation/admin-guide/bcache.rst similarity index 100% rename from Documentation/bcache.txt rename to Documentation/admin-guide/bcache.rst diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst new file mode 100644 index 0000000000000000000000000000000000000000..8a2c52d5c53b7aaa9c2fcc5b684c0ac3dbcd53dc --- /dev/null +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -0,0 +1,2040 @@ +================ +Control Group v2 +================ + +:Date: October, 2015 +:Author: Tejun Heo + +This is the authoritative documentation on the design, interface and +conventions of cgroup v2. It describes all userland-visible aspects +of cgroup including core and specific controller behaviors. All +future changes must be reflected in this document. Documentation for +v1 is available under Documentation/cgroup-v1/. + +.. CONTENTS + + 1. Introduction + 1-1. Terminology + 1-2. What is cgroup? + 2. Basic Operations + 2-1. Mounting + 2-2. Organizing Processes and Threads + 2-2-1. Processes + 2-2-2. Threads + 2-3. [Un]populated Notification + 2-4. Controlling Controllers + 2-4-1. Enabling and Disabling + 2-4-2. Top-down Constraint + 2-4-3. No Internal Process Constraint + 2-5. Delegation + 2-5-1. Model of Delegation + 2-5-2. Delegation Containment + 2-6. Guidelines + 2-6-1. Organize Once and Control + 2-6-2. Avoid Name Collisions + 3. Resource Distribution Models + 3-1. Weights + 3-2. Limits + 3-3. Protections + 3-4. Allocations + 4. Interface Files + 4-1. Format + 4-2. Conventions + 4-3. Core Interface Files + 5. Controllers + 5-1. CPU + 5-1-1. CPU Interface Files + 5-2. Memory + 5-2-1. Memory Interface Files + 5-2-2. Usage Guidelines + 5-2-3. Memory Ownership + 5-3. IO + 5-3-1. IO Interface Files + 5-3-2. Writeback + 5-4. PID + 5-4-1. PID Interface Files + 5-5. Device + 5-6. RDMA + 5-6-1. RDMA Interface Files + 5-7. Misc + 5-7-1. perf_event + 5-N. Non-normative information + 5-N-1. CPU controller root cgroup process behaviour + 5-N-2. IO controller root cgroup process behaviour + 6. Namespace + 6-1. Basics + 6-2. The Root and Views + 6-3. Migration and setns(2) + 6-4. Interaction with Other Namespaces + P. Information on Kernel Programming + P-1. Filesystem Support for Writeback + D. Deprecated v1 Core Features + R. Issues with v1 and Rationales for v2 + R-1. Multiple Hierarchies + R-2. Thread Granularity + R-3. Competition Between Inner Nodes and Threads + R-4. Other Interface Issues + R-5. Controller Issues and Remedies + R-5-1. Memory + + +Introduction +============ + +Terminology +----------- + +"cgroup" stands for "control group" and is never capitalized. The +singular form is used to designate the whole feature and also as a +qualifier as in "cgroup controllers". When explicitly referring to +multiple individual control groups, the plural form "cgroups" is used. + + +What is cgroup? +--------------- + +cgroup is a mechanism to organize processes hierarchically and +distribute system resources along the hierarchy in a controlled and +configurable manner. + +cgroup is largely composed of two parts - the core and controllers. +cgroup core is primarily responsible for hierarchically organizing +processes. A cgroup controller is usually responsible for +distributing a specific type of system resource along the hierarchy +although there are utility controllers which serve purposes other than +resource distribution. + +cgroups form a tree structure and every process in the system belongs +to one and only one cgroup. All threads of a process belong to the +same cgroup. On creation, all processes are put in the cgroup that +the parent process belongs to at the time. A process can be migrated +to another cgroup. Migration of a process doesn't affect already +existing descendant processes. + +Following certain structural constraints, controllers may be enabled or +disabled selectively on a cgroup. All controller behaviors are +hierarchical - if a controller is enabled on a cgroup, it affects all +processes which belong to the cgroups consisting the inclusive +sub-hierarchy of the cgroup. When a controller is enabled on a nested +cgroup, it always restricts the resource distribution further. The +restrictions set closer to the root in the hierarchy can not be +overridden from further away. + + +Basic Operations +================ + +Mounting +-------- + +Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2 +hierarchy can be mounted with the following mount command:: + + # mount -t cgroup2 none $MOUNT_POINT + +cgroup2 filesystem has the magic number 0x63677270 ("cgrp"). All +controllers which support v2 and are not bound to a v1 hierarchy are +automatically bound to the v2 hierarchy and show up at the root. +Controllers which are not in active use in the v2 hierarchy can be +bound to other hierarchies. This allows mixing v2 hierarchy with the +legacy v1 multiple hierarchies in a fully backward compatible way. + +A controller can be moved across hierarchies only after the controller +is no longer referenced in its current hierarchy. Because per-cgroup +controller states are destroyed asynchronously and controllers may +have lingering references, a controller may not show up immediately on +the v2 hierarchy after the final umount of the previous hierarchy. +Similarly, a controller should be fully disabled to be moved out of +the unified hierarchy and it may take some time for the disabled +controller to become available for other hierarchies; furthermore, due +to inter-controller dependencies, other controllers may need to be +disabled too. + +While useful for development and manual configurations, moving +controllers dynamically between the v2 and other hierarchies is +strongly discouraged for production use. It is recommended to decide +the hierarchies and controller associations before starting using the +controllers after system boot. + +During transition to v2, system management software might still +automount the v1 cgroup filesystem and so hijack all controllers +during boot, before manual intervention is possible. To make testing +and experimenting easier, the kernel parameter cgroup_no_v1= allows +disabling controllers in v1 and make them always available in v2. + +cgroup v2 currently supports the following mount options. + + nsdelegate + + Consider cgroup namespaces as delegation boundaries. This + option is system wide and can only be set on mount or modified + through remount from the init namespace. The mount option is + ignored on non-init namespace mounts. Please refer to the + Delegation section for details. + + +Organizing Processes and Threads +-------------------------------- + +Processes +~~~~~~~~~ + +Initially, only the root cgroup exists to which all processes belong. +A child cgroup can be created by creating a sub-directory:: + + # mkdir $CGROUP_NAME + +A given cgroup may have multiple child cgroups forming a tree +structure. Each cgroup has a read-writable interface file +"cgroup.procs". When read, it lists the PIDs of all processes which +belong to the cgroup one-per-line. The PIDs are not ordered and the +same PID may show up more than once if the process got moved to +another cgroup and then back or the PID got recycled while reading. + +A process can be migrated into a cgroup by writing its PID to the +target cgroup's "cgroup.procs" file. Only one process can be migrated +on a single write(2) call. If a process is composed of multiple +threads, writing the PID of any thread migrates all threads of the +process. + +When a process forks a child process, the new process is born into the +cgroup that the forking process belongs to at the time of the +operation. After exit, a process stays associated with the cgroup +that it belonged to at the time of exit until it's reaped; however, a +zombie process does not appear in "cgroup.procs" and thus can't be +moved to another cgroup. + +A cgroup which doesn't have any children or live processes can be +destroyed by removing the directory. Note that a cgroup which doesn't +have any children and is associated only with zombie processes is +considered empty and can be removed:: + + # rmdir $CGROUP_NAME + +"/proc/$PID/cgroup" lists a process's cgroup membership. If legacy +cgroup is in use in the system, this file may contain multiple lines, +one for each hierarchy. The entry for cgroup v2 is always in the +format "0::$PATH":: + + # cat /proc/842/cgroup + ... + 0::/test-cgroup/test-cgroup-nested + +If the process becomes a zombie and the cgroup it was associated with +is removed subsequently, " (deleted)" is appended to the path:: + + # cat /proc/842/cgroup + ... + 0::/test-cgroup/test-cgroup-nested (deleted) + + +Threads +~~~~~~~ + +cgroup v2 supports thread granularity for a subset of controllers to +support use cases requiring hierarchical resource distribution across +the threads of a group of processes. By default, all threads of a +process belong to the same cgroup, which also serves as the resource +domain to host resource consumptions which are not specific to a +process or thread. The thread mode allows threads to be spread across +a subtree while still maintaining the common resource domain for them. + +Controllers which support thread mode are called threaded controllers. +The ones which don't are called domain controllers. + +Marking a cgroup threaded makes it join the resource domain of its +parent as a threaded cgroup. The parent may be another threaded +cgroup whose resource domain is further up in the hierarchy. The root +of a threaded subtree, that is, the nearest ancestor which is not +threaded, is called threaded domain or thread root interchangeably and +serves as the resource domain for the entire subtree. + +Inside a threaded subtree, threads of a process can be put in +different cgroups and are not subject to the no internal process +constraint - threaded controllers can be enabled on non-leaf cgroups +whether they have threads in them or not. + +As the threaded domain cgroup hosts all the domain resource +consumptions of the subtree, it is considered to have internal +resource consumptions whether there are processes in it or not and +can't have populated child cgroups which aren't threaded. Because the +root cgroup is not subject to no internal process constraint, it can +serve both as a threaded domain and a parent to domain cgroups. + +The current operation mode or type of the cgroup is shown in the +"cgroup.type" file which indicates whether the cgroup is a normal +domain, a domain which is serving as the domain of a threaded subtree, +or a threaded cgroup. + +On creation, a cgroup is always a domain cgroup and can be made +threaded by writing "threaded" to the "cgroup.type" file. The +operation is single direction:: + + # echo threaded > cgroup.type + +Once threaded, the cgroup can't be made a domain again. To enable the +thread mode, the following conditions must be met. + +- As the cgroup will join the parent's resource domain. The parent + must either be a valid (threaded) domain or a threaded cgroup. + +- When the parent is an unthreaded domain, it must not have any domain + controllers enabled or populated domain children. The root is + exempt from this requirement. + +Topology-wise, a cgroup can be in an invalid state. Please consider +the following topology:: + + A (threaded domain) - B (threaded) - C (domain, just created) + +C is created as a domain but isn't connected to a parent which can +host child domains. C can't be used until it is turned into a +threaded cgroup. "cgroup.type" file will report "domain (invalid)" in +these cases. Operations which fail due to invalid topology use +EOPNOTSUPP as the errno. + +A domain cgroup is turned into a threaded domain when one of its child +cgroup becomes threaded or threaded controllers are enabled in the +"cgroup.subtree_control" file while there are processes in the cgroup. +A threaded domain reverts to a normal domain when the conditions +clear. + +When read, "cgroup.threads" contains the list of the thread IDs of all +threads in the cgroup. Except that the operations are per-thread +instead of per-process, "cgroup.threads" has the same format and +behaves the same way as "cgroup.procs". While "cgroup.threads" can be +written to in any cgroup, as it can only move threads inside the same +threaded domain, its operations are confined inside each threaded +subtree. + +The threaded domain cgroup serves as the resource domain for the whole +subtree, and, while the threads can be scattered across the subtree, +all the processes are considered to be in the threaded domain cgroup. +"cgroup.procs" in a threaded domain cgroup contains the PIDs of all +processes in the subtree and is not readable in the subtree proper. +However, "cgroup.procs" can be written to from anywhere in the subtree +to migrate all threads of the matching process to the cgroup. + +Only threaded controllers can be enabled in a threaded subtree. When +a threaded controller is enabled inside a threaded subtree, it only +accounts for and controls resource consumptions associated with the +threads in the cgroup and its descendants. All consumptions which +aren't tied to a specific thread belong to the threaded domain cgroup. + +Because a threaded subtree is exempt from no internal process +constraint, a threaded controller must be able to handle competition +between threads in a non-leaf cgroup and its child cgroups. Each +threaded controller defines how such competitions are handled. + + +[Un]populated Notification +-------------------------- + +Each non-root cgroup has a "cgroup.events" file which contains +"populated" field indicating whether the cgroup's sub-hierarchy has +live processes in it. Its value is 0 if there is no live process in +the cgroup and its descendants; otherwise, 1. poll and [id]notify +events are triggered when the value changes. This can be used, for +example, to start a clean-up operation after all processes of a given +sub-hierarchy have exited. The populated state updates and +notifications are recursive. Consider the following sub-hierarchy +where the numbers in the parentheses represent the numbers of processes +in each cgroup:: + + A(4) - B(0) - C(1) + \ D(0) + +A, B and C's "populated" fields would be 1 while D's 0. After the one +process in C exits, B and C's "populated" fields would flip to "0" and +file modified events will be generated on the "cgroup.events" files of +both cgroups. + + +Controlling Controllers +----------------------- + +Enabling and Disabling +~~~~~~~~~~~~~~~~~~~~~~ + +Each cgroup has a "cgroup.controllers" file which lists all +controllers available for the cgroup to enable:: + + # cat cgroup.controllers + cpu io memory + +No controller is enabled by default. Controllers can be enabled and +disabled by writing to the "cgroup.subtree_control" file:: + + # echo "+cpu +memory -io" > cgroup.subtree_control + +Only controllers which are listed in "cgroup.controllers" can be +enabled. When multiple operations are specified as above, either they +all succeed or fail. If multiple operations on the same controller +are specified, the last one is effective. + +Enabling a controller in a cgroup indicates that the distribution of +the target resource across its immediate children will be controlled. +Consider the following sub-hierarchy. The enabled controllers are +listed in parentheses:: + + A(cpu,memory) - B(memory) - C() + \ D() + +As A has "cpu" and "memory" enabled, A will control the distribution +of CPU cycles and memory to its children, in this case, B. As B has +"memory" enabled but not "CPU", C and D will compete freely on CPU +cycles but their division of memory available to B will be controlled. + +As a controller regulates the distribution of the target resource to +the cgroup's children, enabling it creates the controller's interface +files in the child cgroups. In the above example, enabling "cpu" on B +would create the "cpu." prefixed controller interface files in C and +D. Likewise, disabling "memory" from B would remove the "memory." +prefixed controller interface files from C and D. This means that the +controller interface files - anything which doesn't start with +"cgroup." are owned by the parent rather than the cgroup itself. + + +Top-down Constraint +~~~~~~~~~~~~~~~~~~~ + +Resources are distributed top-down and a cgroup can further distribute +a resource only if the resource has been distributed to it from the +parent. This means that all non-root "cgroup.subtree_control" files +can only contain controllers which are enabled in the parent's +"cgroup.subtree_control" file. A controller can be enabled only if +the parent has the controller enabled and a controller can't be +disabled if one or more children have it enabled. + + +No Internal Process Constraint +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Non-root cgroups can distribute domain resources to their children +only when they don't have any processes of their own. In other words, +only domain cgroups which don't contain any processes can have domain +controllers enabled in their "cgroup.subtree_control" files. + +This guarantees that, when a domain controller is looking at the part +of the hierarchy which has it enabled, processes are always only on +the leaves. This rules out situations where child cgroups compete +against internal processes of the parent. + +The root cgroup is exempt from this restriction. Root contains +processes and anonymous resource consumption which can't be associated +with any other cgroups and requires special treatment from most +controllers. How resource consumption in the root cgroup is governed +is up to each controller (for more information on this topic please +refer to the Non-normative information section in the Controllers +chapter). + +Note that the restriction doesn't get in the way if there is no +enabled controller in the cgroup's "cgroup.subtree_control". This is +important as otherwise it wouldn't be possible to create children of a +populated cgroup. To control resource distribution of a cgroup, the +cgroup must create children and transfer all its processes to the +children before enabling controllers in its "cgroup.subtree_control" +file. + + +Delegation +---------- + +Model of Delegation +~~~~~~~~~~~~~~~~~~~ + +A cgroup can be delegated in two ways. First, to a less privileged +user by granting write access of the directory and its "cgroup.procs", +"cgroup.threads" and "cgroup.subtree_control" files to the user. +Second, if the "nsdelegate" mount option is set, automatically to a +cgroup namespace on namespace creation. + +Because the resource control interface files in a given directory +control the distribution of the parent's resources, the delegatee +shouldn't be allowed to write to them. For the first method, this is +achieved by not granting access to these files. For the second, the +kernel rejects writes to all files other than "cgroup.procs" and +"cgroup.subtree_control" on a namespace root from inside the +namespace. + +The end results are equivalent for both delegation types. Once +delegated, the user can build sub-hierarchy under the directory, +organize processes inside it as it sees fit and further distribute the +resources it received from the parent. The limits and other settings +of all resource controllers are hierarchical and regardless of what +happens in the delegated sub-hierarchy, nothing can escape the +resource restrictions imposed by the parent. + +Currently, cgroup doesn't impose any restrictions on the number of +cgroups in or nesting depth of a delegated sub-hierarchy; however, +this may be limited explicitly in the future. + + +Delegation Containment +~~~~~~~~~~~~~~~~~~~~~~ + +A delegated sub-hierarchy is contained in the sense that processes +can't be moved into or out of the sub-hierarchy by the delegatee. + +For delegations to a less privileged user, this is achieved by +requiring the following conditions for a process with a non-root euid +to migrate a target process into a cgroup by writing its PID to the +"cgroup.procs" file. + +- The writer must have write access to the "cgroup.procs" file. + +- The writer must have write access to the "cgroup.procs" file of the + common ancestor of the source and destination cgroups. + +The above two constraints ensure that while a delegatee may migrate +processes around freely in the delegated sub-hierarchy it can't pull +in from or push out to outside the sub-hierarchy. + +For an example, let's assume cgroups C0 and C1 have been delegated to +user U0 who created C00, C01 under C0 and C10 under C1 as follows and +all processes under C0 and C1 belong to U0:: + + ~~~~~~~~~~~~~ - C0 - C00 + ~ cgroup ~ \ C01 + ~ hierarchy ~ + ~~~~~~~~~~~~~ - C1 - C10 + +Let's also say U0 wants to write the PID of a process which is +currently in C10 into "C00/cgroup.procs". U0 has write access to the +file; however, the common ancestor of the source cgroup C10 and the +destination cgroup C00 is above the points of delegation and U0 would +not have write access to its "cgroup.procs" files and thus the write +will be denied with -EACCES. + +For delegations to namespaces, containment is achieved by requiring +that both the source and destination cgroups are reachable from the +namespace of the process which is attempting the migration. If either +is not reachable, the migration is rejected with -ENOENT. + + +Guidelines +---------- + +Organize Once and Control +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Migrating a process across cgroups is a relatively expensive operation +and stateful resources such as memory are not moved together with the +process. This is an explicit design decision as there often exist +inherent trade-offs between migration and various hot paths in terms +of synchronization cost. + +As such, migrating processes across cgroups frequently as a means to +apply different resource restrictions is discouraged. A workload +should be assigned to a cgroup according to the system's logical and +resource structure once on start-up. Dynamic adjustments to resource +distribution can be made by changing controller configuration through +the interface files. + + +Avoid Name Collisions +~~~~~~~~~~~~~~~~~~~~~ + +Interface files for a cgroup and its children cgroups occupy the same +directory and it is possible to create children cgroups which collide +with interface files. + +All cgroup core interface files are prefixed with "cgroup." and each +controller's interface files are prefixed with the controller name and +a dot. A controller's name is composed of lower case alphabets and +'_'s but never begins with an '_' so it can be used as the prefix +character for collision avoidance. Also, interface file names won't +start or end with terms which are often used in categorizing workloads +such as job, service, slice, unit or workload. + +cgroup doesn't do anything to prevent name collisions and it's the +user's responsibility to avoid them. + + +Resource Distribution Models +============================ + +cgroup controllers implement several resource distribution schemes +depending on the resource type and expected use cases. This section +describes major schemes in use along with their expected behaviors. + + +Weights +------- + +A parent's resource is distributed by adding up the weights of all +active children and giving each the fraction matching the ratio of its +weight against the sum. As only children which can make use of the +resource at the moment participate in the distribution, this is +work-conserving. Due to the dynamic nature, this model is usually +used for stateless resources. + +All weights are in the range [1, 10000] with the default at 100. This +allows symmetric multiplicative biases in both directions at fine +enough granularity while staying in the intuitive range. + +As long as the weight is in range, all configuration combinations are +valid and there is no reason to reject configuration changes or +process migrations. + +"cpu.weight" proportionally distributes CPU cycles to active children +and is an example of this type. + + +Limits +------ + +A child can only consume upto the configured amount of the resource. +Limits can be over-committed - the sum of the limits of children can +exceed the amount of resource available to the parent. + +Limits are in the range [0, max] and defaults to "max", which is noop. + +As limits can be over-committed, all configuration combinations are +valid and there is no reason to reject configuration changes or +process migrations. + +"io.max" limits the maximum BPS and/or IOPS that a cgroup can consume +on an IO device and is an example of this type. + + +Protections +----------- + +A cgroup is protected to be allocated upto the configured amount of +the resource if the usages of all its ancestors are under their +protected levels. Protections can be hard guarantees or best effort +soft boundaries. Protections can also be over-committed in which case +only upto the amount available to the parent is protected among +children. + +Protections are in the range [0, max] and defaults to 0, which is +noop. + +As protections can be over-committed, all configuration combinations +are valid and there is no reason to reject configuration changes or +process migrations. + +"memory.low" implements best-effort memory protection and is an +example of this type. + + +Allocations +----------- + +A cgroup is exclusively allocated a certain amount of a finite +resource. Allocations can't be over-committed - the sum of the +allocations of children can not exceed the amount of resource +available to the parent. + +Allocations are in the range [0, max] and defaults to 0, which is no +resource. + +As allocations can't be over-committed, some configuration +combinations are invalid and should be rejected. Also, if the +resource is mandatory for execution of processes, process migrations +may be rejected. + +"cpu.rt.max" hard-allocates realtime slices and is an example of this +type. + + +Interface Files +=============== + +Format +------ + +All interface files should be in one of the following formats whenever +possible:: + + New-line separated values + (when only one value can be written at once) + + VAL0\n + VAL1\n + ... + + Space separated values + (when read-only or multiple values can be written at once) + + VAL0 VAL1 ...\n + + Flat keyed + + KEY0 VAL0\n + KEY1 VAL1\n + ... + + Nested keyed + + KEY0 SUB_KEY0=VAL00 SUB_KEY1=VAL01... + KEY1 SUB_KEY0=VAL10 SUB_KEY1=VAL11... + ... + +For a writable file, the format for writing should generally match +reading; however, controllers may allow omitting later fields or +implement restricted shortcuts for most common use cases. + +For both flat and nested keyed files, only the values for a single key +can be written at a time. For nested keyed files, the sub key pairs +may be specified in any order and not all pairs have to be specified. + + +Conventions +----------- + +- Settings for a single feature should be contained in a single file. + +- The root cgroup should be exempt from resource control and thus + shouldn't have resource control interface files. Also, + informational files on the root cgroup which end up showing global + information available elsewhere shouldn't exist. + +- If a controller implements weight based resource distribution, its + interface file should be named "weight" and have the range [1, + 10000] with 100 as the default. The values are chosen to allow + enough and symmetric bias in both directions while keeping it + intuitive (the default is 100%). + +- If a controller implements an absolute resource guarantee and/or + limit, the interface files should be named "min" and "max" + respectively. If a controller implements best effort resource + guarantee and/or limit, the interface files should be named "low" + and "high" respectively. + + In the above four control files, the special token "max" should be + used to represent upward infinity for both reading and writing. + +- If a setting has a configurable default value and keyed specific + overrides, the default entry should be keyed with "default" and + appear as the first entry in the file. + + The default value can be updated by writing either "default $VAL" or + "$VAL". + + When writing to update a specific override, "default" can be used as + the value to indicate removal of the override. Override entries + with "default" as the value must not appear when read. + + For example, a setting which is keyed by major:minor device numbers + with integer values may look like the following:: + + # cat cgroup-example-interface-file + default 150 + 8:0 300 + + The default value can be updated by:: + + # echo 125 > cgroup-example-interface-file + + or:: + + # echo "default 125" > cgroup-example-interface-file + + An override can be set by:: + + # echo "8:16 170" > cgroup-example-interface-file + + and cleared by:: + + # echo "8:0 default" > cgroup-example-interface-file + # cat cgroup-example-interface-file + default 125 + 8:16 170 + +- For events which are not very high frequency, an interface file + "events" should be created which lists event key value pairs. + Whenever a notifiable event happens, file modified event should be + generated on the file. + + +Core Interface Files +-------------------- + +All cgroup core files are prefixed with "cgroup." + + cgroup.type + + A read-write single value file which exists on non-root + cgroups. + + When read, it indicates the current type of the cgroup, which + can be one of the following values. + + - "domain" : A normal valid domain cgroup. + + - "domain threaded" : A threaded domain cgroup which is + serving as the root of a threaded subtree. + + - "domain invalid" : A cgroup which is in an invalid state. + It can't be populated or have controllers enabled. It may + be allowed to become a threaded cgroup. + + - "threaded" : A threaded cgroup which is a member of a + threaded subtree. + + A cgroup can be turned into a threaded cgroup by writing + "threaded" to this file. + + cgroup.procs + A read-write new-line separated values file which exists on + all cgroups. + + When read, it lists the PIDs of all processes which belong to + the cgroup one-per-line. The PIDs are not ordered and the + same PID may show up more than once if the process got moved + to another cgroup and then back or the PID got recycled while + reading. + + A PID can be written to migrate the process associated with + the PID to the cgroup. The writer should match all of the + following conditions. + + - It must have write access to the "cgroup.procs" file. + + - It must have write access to the "cgroup.procs" file of the + common ancestor of the source and destination cgroups. + + When delegating a sub-hierarchy, write access to this file + should be granted along with the containing directory. + + In a threaded cgroup, reading this file fails with EOPNOTSUPP + as all the processes belong to the thread root. Writing is + supported and moves every thread of the process to the cgroup. + + cgroup.threads + A read-write new-line separated values file which exists on + all cgroups. + + When read, it lists the TIDs of all threads which belong to + the cgroup one-per-line. The TIDs are not ordered and the + same TID may show up more than once if the thread got moved to + another cgroup and then back or the TID got recycled while + reading. + + A TID can be written to migrate the thread associated with the + TID to the cgroup. The writer should match all of the + following conditions. + + - It must have write access to the "cgroup.threads" file. + + - The cgroup that the thread is currently in must be in the + same resource domain as the destination cgroup. + + - It must have write access to the "cgroup.procs" file of the + common ancestor of the source and destination cgroups. + + When delegating a sub-hierarchy, write access to this file + should be granted along with the containing directory. + + cgroup.controllers + A read-only space separated values file which exists on all + cgroups. + + It shows space separated list of all controllers available to + the cgroup. The controllers are not ordered. + + cgroup.subtree_control + A read-write space separated values file which exists on all + cgroups. Starts out empty. + + When read, it shows space separated list of the controllers + which are enabled to control resource distribution from the + cgroup to its children. + + Space separated list of controllers prefixed with '+' or '-' + can be written to enable or disable controllers. A controller + name prefixed with '+' enables the controller and '-' + disables. If a controller appears more than once on the list, + the last one is effective. When multiple enable and disable + operations are specified, either all succeed or all fail. + + cgroup.events + A read-only flat-keyed file which exists on non-root cgroups. + The following entries are defined. Unless specified + otherwise, a value change in this file generates a file + modified event. + + populated + 1 if the cgroup or its descendants contains any live + processes; otherwise, 0. + + cgroup.max.descendants + A read-write single value files. The default is "max". + + Maximum allowed number of descent cgroups. + If the actual number of descendants is equal or larger, + an attempt to create a new cgroup in the hierarchy will fail. + + cgroup.max.depth + A read-write single value files. The default is "max". + + Maximum allowed descent depth below the current cgroup. + If the actual descent depth is equal or larger, + an attempt to create a new child cgroup will fail. + + cgroup.stat + A read-only flat-keyed file with the following entries: + + nr_descendants + Total number of visible descendant cgroups. + + nr_dying_descendants + Total number of dying descendant cgroups. A cgroup becomes + dying after being deleted by a user. The cgroup will remain + in dying state for some time undefined time (which can depend + on system load) before being completely destroyed. + + A process can't enter a dying cgroup under any circumstances, + a dying cgroup can't revive. + + A dying cgroup can consume system resources not exceeding + limits, which were active at the moment of cgroup deletion. + + +Controllers +=========== + +CPU +--- + +The "cpu" controllers regulates distribution of CPU cycles. This +controller implements weight and absolute bandwidth limit models for +normal scheduling policy and absolute bandwidth allocation model for +realtime scheduling policy. + +WARNING: cgroup2 doesn't yet support control of realtime processes and +the cpu controller can only be enabled when all RT processes are in +the root cgroup. Be aware that system management software may already +have placed RT processes into nonroot cgroups during the system boot +process, and these processes may need to be moved to the root cgroup +before the cpu controller can be enabled. + + +CPU Interface Files +~~~~~~~~~~~~~~~~~~~ + +All time durations are in microseconds. + + cpu.stat + A read-only flat-keyed file which exists on non-root cgroups. + This file exists whether the controller is enabled or not. + + It always reports the following three stats: + + - usage_usec + - user_usec + - system_usec + + and the following three when the controller is enabled: + + - nr_periods + - nr_throttled + - throttled_usec + + cpu.weight + A read-write single value file which exists on non-root + cgroups. The default is "100". + + The weight in the range [1, 10000]. + + cpu.weight.nice + A read-write single value file which exists on non-root + cgroups. The default is "0". + + The nice value is in the range [-20, 19]. + + This interface file is an alternative interface for + "cpu.weight" and allows reading and setting weight using the + same values used by nice(2). Because the range is smaller and + granularity is coarser for the nice values, the read value is + the closest approximation of the current weight. + + cpu.max + A read-write two value file which exists on non-root cgroups. + The default is "max 100000". + + The maximum bandwidth limit. It's in the following format:: + + $MAX $PERIOD + + which indicates that the group may consume upto $MAX in each + $PERIOD duration. "max" for $MAX indicates no limit. If only + one number is written, $MAX is updated. + + +Memory +------ + +The "memory" controller regulates distribution of memory. Memory is +stateful and implements both limit and protection models. Due to the +intertwining between memory usage and reclaim pressure and the +stateful nature of memory, the distribution model is relatively +complex. + +While not completely water-tight, all major memory usages by a given +cgroup are tracked so that the total memory consumption can be +accounted and controlled to a reasonable extent. Currently, the +following types of memory usages are tracked. + +- Userland memory - page cache and anonymous memory. + +- Kernel data structures such as dentries and inodes. + +- TCP socket buffers. + +The above list may expand in the future for better coverage. + + +Memory Interface Files +~~~~~~~~~~~~~~~~~~~~~~ + +All memory amounts are in bytes. If a value which is not aligned to +PAGE_SIZE is written, the value may be rounded up to the closest +PAGE_SIZE multiple when read back. + + memory.current + A read-only single value file which exists on non-root + cgroups. + + The total amount of memory currently being used by the cgroup + and its descendants. + + memory.min + A read-write single value file which exists on non-root + cgroups. The default is "0". + + Hard memory protection. If the memory usage of a cgroup + is within its effective min boundary, the cgroup's memory + won't be reclaimed under any conditions. If there is no + unprotected reclaimable memory available, OOM killer + is invoked. + + Effective min boundary is limited by memory.min values of + all ancestor cgroups. If there is memory.min overcommitment + (child cgroup or cgroups are requiring more protected memory + than parent will allow), then each child cgroup will get + the part of parent's protection proportional to its + actual memory usage below memory.min. + + Putting more memory than generally available under this + protection is discouraged and may lead to constant OOMs. + + If a memory cgroup is not populated with processes, + its memory.min is ignored. + + memory.low + A read-write single value file which exists on non-root + cgroups. The default is "0". + + Best-effort memory protection. If the memory usage of a + cgroup is within its effective low boundary, the cgroup's + memory won't be reclaimed unless memory can be reclaimed + from unprotected cgroups. + + Effective low boundary is limited by memory.low values of + all ancestor cgroups. If there is memory.low overcommitment + (child cgroup or cgroups are requiring more protected memory + than parent will allow), then each child cgroup will get + the part of parent's protection proportional to its + actual memory usage below memory.low. + + Putting more memory than generally available under this + protection is discouraged. + + memory.high + A read-write single value file which exists on non-root + cgroups. The default is "max". + + Memory usage throttle limit. This is the main mechanism to + control memory usage of a cgroup. If a cgroup's usage goes + over the high boundary, the processes of the cgroup are + throttled and put under heavy reclaim pressure. + + Going over the high limit never invokes the OOM killer and + under extreme conditions the limit may be breached. + + memory.max + A read-write single value file which exists on non-root + cgroups. The default is "max". + + Memory usage hard limit. This is the final protection + mechanism. If a cgroup's memory usage reaches this limit and + can't be reduced, the OOM killer is invoked in the cgroup. + Under certain circumstances, the usage may go over the limit + temporarily. + + This is the ultimate protection mechanism. As long as the + high limit is used and monitored properly, this limit's + utility is limited to providing the final safety net. + + memory.events + A read-only flat-keyed file which exists on non-root cgroups. + The following entries are defined. Unless specified + otherwise, a value change in this file generates a file + modified event. + + low + The number of times the cgroup is reclaimed due to + high memory pressure even though its usage is under + the low boundary. This usually indicates that the low + boundary is over-committed. + + high + The number of times processes of the cgroup are + throttled and routed to perform direct memory reclaim + because the high memory boundary was exceeded. For a + cgroup whose memory usage is capped by the high limit + rather than global memory pressure, this event's + occurrences are expected. + + max + The number of times the cgroup's memory usage was + about to go over the max boundary. If direct reclaim + fails to bring it down, the cgroup goes to OOM state. + + oom + The number of time the cgroup's memory usage was + reached the limit and allocation was about to fail. + + Depending on context result could be invocation of OOM + killer and retrying allocation or failing allocation. + + Failed allocation in its turn could be returned into + userspace as -ENOMEM or silently ignored in cases like + disk readahead. For now OOM in memory cgroup kills + tasks iff shortage has happened inside page fault. + + oom_kill + The number of processes belonging to this cgroup + killed by any kind of OOM killer. + + memory.stat + A read-only flat-keyed file which exists on non-root cgroups. + + This breaks down the cgroup's memory footprint into different + types of memory, type-specific details, and other information + on the state and past events of the memory management system. + + All memory amounts are in bytes. + + The entries are ordered to be human readable, and new entries + can show up in the middle. Don't rely on items remaining in a + fixed position; use the keys to look up specific values! + + anon + Amount of memory used in anonymous mappings such as + brk(), sbrk(), and mmap(MAP_ANONYMOUS) + + file + Amount of memory used to cache filesystem data, + including tmpfs and shared memory. + + kernel_stack + Amount of memory allocated to kernel stacks. + + slab + Amount of memory used for storing in-kernel data + structures. + + sock + Amount of memory used in network transmission buffers + + shmem + Amount of cached filesystem data that is swap-backed, + such as tmpfs, shm segments, shared anonymous mmap()s + + file_mapped + Amount of cached filesystem data mapped with mmap() + + file_dirty + Amount of cached filesystem data that was modified but + not yet written back to disk + + file_writeback + Amount of cached filesystem data that was modified and + is currently being written back to disk + + inactive_anon, active_anon, inactive_file, active_file, unevictable + Amount of memory, swap-backed and filesystem-backed, + on the internal memory management lists used by the + page reclaim algorithm + + slab_reclaimable + Part of "slab" that might be reclaimed, such as + dentries and inodes. + + slab_unreclaimable + Part of "slab" that cannot be reclaimed on memory + pressure. + + pgfault + Total number of page faults incurred + + pgmajfault + Number of major page faults incurred + + workingset_refault + + Number of refaults of previously evicted pages + + workingset_activate + + Number of refaulted pages that were immediately activated + + workingset_nodereclaim + + Number of times a shadow node has been reclaimed + + pgrefill + + Amount of scanned pages (in an active LRU list) + + pgscan + + Amount of scanned pages (in an inactive LRU list) + + pgsteal + + Amount of reclaimed pages + + pgactivate + + Amount of pages moved to the active LRU list + + pgdeactivate + + Amount of pages moved to the inactive LRU lis + + pglazyfree + + Amount of pages postponed to be freed under memory pressure + + pglazyfreed + + Amount of reclaimed lazyfree pages + + memory.swap.current + A read-only single value file which exists on non-root + cgroups. + + The total amount of swap currently being used by the cgroup + and its descendants. + + memory.swap.max + A read-write single value file which exists on non-root + cgroups. The default is "max". + + Swap usage hard limit. If a cgroup's swap usage reaches this + limit, anonymous memory of the cgroup will not be swapped out. + + memory.swap.events + A read-only flat-keyed file which exists on non-root cgroups. + The following entries are defined. Unless specified + otherwise, a value change in this file generates a file + modified event. + + max + The number of times the cgroup's swap usage was about + to go over the max boundary and swap allocation + failed. + + fail + The number of times swap allocation failed either + because of running out of swap system-wide or max + limit. + + When reduced under the current usage, the existing swap + entries are reclaimed gradually and the swap usage may stay + higher than the limit for an extended period of time. This + reduces the impact on the workload and memory management. + + +Usage Guidelines +~~~~~~~~~~~~~~~~ + +"memory.high" is the main mechanism to control memory usage. +Over-committing on high limit (sum of high limits > available memory) +and letting global memory pressure to distribute memory according to +usage is a viable strategy. + +Because breach of the high limit doesn't trigger the OOM killer but +throttles the offending cgroup, a management agent has ample +opportunities to monitor and take appropriate actions such as granting +more memory or terminating the workload. + +Determining whether a cgroup has enough memory is not trivial as +memory usage doesn't indicate whether the workload can benefit from +more memory. For example, a workload which writes data received from +network to a file can use all available memory but can also operate as +performant with a small amount of memory. A measure of memory +pressure - how much the workload is being impacted due to lack of +memory - is necessary to determine whether a workload needs more +memory; unfortunately, memory pressure monitoring mechanism isn't +implemented yet. + + +Memory Ownership +~~~~~~~~~~~~~~~~ + +A memory area is charged to the cgroup which instantiated it and stays +charged to the cgroup until the area is released. Migrating a process +to a different cgroup doesn't move the memory usages that it +instantiated while in the previous cgroup to the new cgroup. + +A memory area may be used by processes belonging to different cgroups. +To which cgroup the area will be charged is in-deterministic; however, +over time, the memory area is likely to end up in a cgroup which has +enough memory allowance to avoid high reclaim pressure. + +If a cgroup sweeps a considerable amount of memory which is expected +to be accessed repeatedly by other cgroups, it may make sense to use +POSIX_FADV_DONTNEED to relinquish the ownership of memory areas +belonging to the affected files to ensure correct memory ownership. + + +IO +-- + +The "io" controller regulates the distribution of IO resources. This +controller implements both weight based and absolute bandwidth or IOPS +limit distribution; however, weight based distribution is available +only if cfq-iosched is in use and neither scheme is available for +blk-mq devices. + + +IO Interface Files +~~~~~~~~~~~~~~~~~~ + + io.stat + A read-only nested-keyed file which exists on non-root + cgroups. + + Lines are keyed by $MAJ:$MIN device numbers and not ordered. + The following nested keys are defined. + + ====== =================== + rbytes Bytes read + wbytes Bytes written + rios Number of read IOs + wios Number of write IOs + ====== =================== + + An example read output follows: + + 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 + 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 + + io.weight + A read-write flat-keyed file which exists on non-root cgroups. + The default is "default 100". + + The first line is the default weight applied to devices + without specific override. The rest are overrides keyed by + $MAJ:$MIN device numbers and not ordered. The weights are in + the range [1, 10000] and specifies the relative amount IO time + the cgroup can use in relation to its siblings. + + The default weight can be updated by writing either "default + $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing + "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default". + + An example read output follows:: + + default 100 + 8:16 200 + 8:0 50 + + io.max + A read-write nested-keyed file which exists on non-root + cgroups. + + BPS and IOPS based IO limit. Lines are keyed by $MAJ:$MIN + device numbers and not ordered. The following nested keys are + defined. + + ===== ================================== + rbps Max read bytes per second + wbps Max write bytes per second + riops Max read IO operations per second + wiops Max write IO operations per second + ===== ================================== + + When writing, any number of nested key-value pairs can be + specified in any order. "max" can be specified as the value + to remove a specific limit. If the same key is specified + multiple times, the outcome is undefined. + + BPS and IOPS are measured in each IO direction and IOs are + delayed if limit is reached. Temporary bursts are allowed. + + Setting read limit at 2M BPS and write at 120 IOPS for 8:16:: + + echo "8:16 rbps=2097152 wiops=120" > io.max + + Reading returns the following:: + + 8:16 rbps=2097152 wbps=max riops=max wiops=120 + + Write IOPS limit can be removed by writing the following:: + + echo "8:16 wiops=max" > io.max + + Reading now returns the following:: + + 8:16 rbps=2097152 wbps=max riops=max wiops=max + + +Writeback +~~~~~~~~~ + +Page cache is dirtied through buffered writes and shared mmaps and +written asynchronously to the backing filesystem by the writeback +mechanism. Writeback sits between the memory and IO domains and +regulates the proportion of dirty memory by balancing dirtying and +write IOs. + +The io controller, in conjunction with the memory controller, +implements control of page cache writeback IOs. The memory controller +defines the memory domain that dirty memory ratio is calculated and +maintained for and the io controller defines the io domain which +writes out dirty pages for the memory domain. Both system-wide and +per-cgroup dirty memory states are examined and the more restrictive +of the two is enforced. + +cgroup writeback requires explicit support from the underlying +filesystem. Currently, cgroup writeback is implemented on ext2, ext4 +and btrfs. On other filesystems, all writeback IOs are attributed to +the root cgroup. + +There are inherent differences in memory and writeback management +which affects how cgroup ownership is tracked. Memory is tracked per +page while writeback per inode. For the purpose of writeback, an +inode is assigned to a cgroup and all IO requests to write dirty pages +from the inode are attributed to that cgroup. + +As cgroup ownership for memory is tracked per page, there can be pages +which are associated with different cgroups than the one the inode is +associated with. These are called foreign pages. The writeback +constantly keeps track of foreign pages and, if a particular foreign +cgroup becomes the majority over a certain period of time, switches +the ownership of the inode to that cgroup. + +While this model is enough for most use cases where a given inode is +mostly dirtied by a single cgroup even when the main writing cgroup +changes over time, use cases where multiple cgroups write to a single +inode simultaneously are not supported well. In such circumstances, a +significant portion of IOs are likely to be attributed incorrectly. +As memory controller assigns page ownership on the first use and +doesn't update it until the page is released, even if writeback +strictly follows page ownership, multiple cgroups dirtying overlapping +areas wouldn't work as expected. It's recommended to avoid such usage +patterns. + +The sysctl knobs which affect writeback behavior are applied to cgroup +writeback as follows. + + vm.dirty_background_ratio, vm.dirty_ratio + These ratios apply the same to cgroup writeback with the + amount of available memory capped by limits imposed by the + memory controller and system-wide clean memory. + + vm.dirty_background_bytes, vm.dirty_bytes + For cgroup writeback, this is calculated into ratio against + total available memory and applied the same way as + vm.dirty[_background]_ratio. + + +PID +--- + +The process number controller is used to allow a cgroup to stop any +new tasks from being fork()'d or clone()'d after a specified limit is +reached. + +The number of tasks in a cgroup can be exhausted in ways which other +controllers cannot prevent, thus warranting its own controller. For +example, a fork bomb is likely to exhaust the number of tasks before +hitting memory restrictions. + +Note that PIDs used in this controller refer to TIDs, process IDs as +used by the kernel. + + +PID Interface Files +~~~~~~~~~~~~~~~~~~~ + + pids.max + A read-write single value file which exists on non-root + cgroups. The default is "max". + + Hard limit of number of processes. + + pids.current + A read-only single value file which exists on all cgroups. + + The number of processes currently in the cgroup and its + descendants. + +Organisational operations are not blocked by cgroup policies, so it is +possible to have pids.current > pids.max. This can be done by either +setting the limit to be smaller than pids.current, or attaching enough +processes to the cgroup such that pids.current is larger than +pids.max. However, it is not possible to violate a cgroup PID policy +through fork() or clone(). These will return -EAGAIN if the creation +of a new process would cause a cgroup policy to be violated. + + +Device controller +----------------- + +Device controller manages access to device files. It includes both +creation of new device files (using mknod), and access to the +existing device files. + +Cgroup v2 device controller has no interface files and is implemented +on top of cgroup BPF. To control access to device files, a user may +create bpf programs of the BPF_CGROUP_DEVICE type and attach them +to cgroups. On an attempt to access a device file, corresponding +BPF programs will be executed, and depending on the return value +the attempt will succeed or fail with -EPERM. + +A BPF_CGROUP_DEVICE program takes a pointer to the bpf_cgroup_dev_ctx +structure, which describes the device access attempt: access type +(mknod/read/write) and device (type, major and minor numbers). +If the program returns 0, the attempt fails with -EPERM, otherwise +it succeeds. + +An example of BPF_CGROUP_DEVICE program may be found in the kernel +source tree in the tools/testing/selftests/bpf/dev_cgroup.c file. + + +RDMA +---- + +The "rdma" controller regulates the distribution and accounting of +of RDMA resources. + +RDMA Interface Files +~~~~~~~~~~~~~~~~~~~~ + + rdma.max + A readwrite nested-keyed file that exists for all the cgroups + except root that describes current configured resource limit + for a RDMA/IB device. + + Lines are keyed by device name and are not ordered. + Each line contains space separated resource name and its configured + limit that can be distributed. + + The following nested keys are defined. + + ========== ============================= + hca_handle Maximum number of HCA Handles + hca_object Maximum number of HCA Objects + ========== ============================= + + An example for mlx4 and ocrdma device follows:: + + mlx4_0 hca_handle=2 hca_object=2000 + ocrdma1 hca_handle=3 hca_object=max + + rdma.current + A read-only file that describes current resource usage. + It exists for all the cgroup except root. + + An example for mlx4 and ocrdma device follows:: + + mlx4_0 hca_handle=1 hca_object=20 + ocrdma1 hca_handle=1 hca_object=23 + + +Misc +---- + +perf_event +~~~~~~~~~~ + +perf_event controller, if not mounted on a legacy hierarchy, is +automatically enabled on the v2 hierarchy so that perf events can +always be filtered by cgroup v2 path. The controller can still be +moved to a legacy hierarchy after v2 hierarchy is populated. + + +Non-normative information +------------------------- + +This section contains information that isn't considered to be a part of +the stable kernel API and so is subject to change. + + +CPU controller root cgroup process behaviour +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When distributing CPU cycles in the root cgroup each thread in this +cgroup is treated as if it was hosted in a separate child cgroup of the +root cgroup. This child cgroup weight is dependent on its thread nice +level. + +For details of this mapping see sched_prio_to_weight array in +kernel/sched/core.c file (values from this array should be scaled +appropriately so the neutral - nice 0 - value is 100 instead of 1024). + + +IO controller root cgroup process behaviour +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Root cgroup processes are hosted in an implicit leaf child node. +When distributing IO resources this implicit child node is taken into +account as if it was a normal child cgroup of the root cgroup with a +weight value of 200. + + +Namespace +========= + +Basics +------ + +cgroup namespace provides a mechanism to virtualize the view of the +"/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone +flag can be used with clone(2) and unshare(2) to create a new cgroup +namespace. The process running inside the cgroup namespace will have +its "/proc/$PID/cgroup" output restricted to cgroupns root. The +cgroupns root is the cgroup of the process at the time of creation of +the cgroup namespace. + +Without cgroup namespace, the "/proc/$PID/cgroup" file shows the +complete path of the cgroup of a process. In a container setup where +a set of cgroups and namespaces are intended to isolate processes the +"/proc/$PID/cgroup" file may leak potential system level information +to the isolated processes. For Example:: + + # cat /proc/self/cgroup + 0::/batchjobs/container_id1 + +The path '/batchjobs/container_id1' can be considered as system-data +and undesirable to expose to the isolated processes. cgroup namespace +can be used to restrict visibility of this path. For example, before +creating a cgroup namespace, one would see:: + + # ls -l /proc/self/ns/cgroup + lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] + # cat /proc/self/cgroup + 0::/batchjobs/container_id1 + +After unsharing a new namespace, the view changes:: + + # ls -l /proc/self/ns/cgroup + lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] + # cat /proc/self/cgroup + 0::/ + +When some thread from a multi-threaded process unshares its cgroup +namespace, the new cgroupns gets applied to the entire process (all +the threads). This is natural for the v2 hierarchy; however, for the +legacy hierarchies, this may be unexpected. + +A cgroup namespace is alive as long as there are processes inside or +mounts pinning it. When the last usage goes away, the cgroup +namespace is destroyed. The cgroupns root and the actual cgroups +remain. + + +The Root and Views +------------------ + +The 'cgroupns root' for a cgroup namespace is the cgroup in which the +process calling unshare(2) is running. For example, if a process in +/batchjobs/container_id1 cgroup calls unshare, cgroup +/batchjobs/container_id1 becomes the cgroupns root. For the +init_cgroup_ns, this is the real root ('/') cgroup. + +The cgroupns root cgroup does not change even if the namespace creator +process later moves to a different cgroup:: + + # ~/unshare -c # unshare cgroupns in some cgroup + # cat /proc/self/cgroup + 0::/ + # mkdir sub_cgrp_1 + # echo 0 > sub_cgrp_1/cgroup.procs + # cat /proc/self/cgroup + 0::/sub_cgrp_1 + +Each process gets its namespace-specific view of "/proc/$PID/cgroup" + +Processes running inside the cgroup namespace will be able to see +cgroup paths (in /proc/self/cgroup) only inside their root cgroup. +From within an unshared cgroupns:: + + # sleep 100000 & + [1] 7353 + # echo 7353 > sub_cgrp_1/cgroup.procs + # cat /proc/7353/cgroup + 0::/sub_cgrp_1 + +From the initial cgroup namespace, the real cgroup path will be +visible:: + + $ cat /proc/7353/cgroup + 0::/batchjobs/container_id1/sub_cgrp_1 + +From a sibling cgroup namespace (that is, a namespace rooted at a +different cgroup), the cgroup path relative to its own cgroup +namespace root will be shown. For instance, if PID 7353's cgroup +namespace root is at '/batchjobs/container_id2', then it will see:: + + # cat /proc/7353/cgroup + 0::/../container_id2/sub_cgrp_1 + +Note that the relative path always starts with '/' to indicate that +its relative to the cgroup namespace root of the caller. + + +Migration and setns(2) +---------------------- + +Processes inside a cgroup namespace can move into and out of the +namespace root if they have proper access to external cgroups. For +example, from inside a namespace with cgroupns root at +/batchjobs/container_id1, and assuming that the global hierarchy is +still accessible inside cgroupns:: + + # cat /proc/7353/cgroup + 0::/sub_cgrp_1 + # echo 7353 > batchjobs/container_id2/cgroup.procs + # cat /proc/7353/cgroup + 0::/../container_id2 + +Note that this kind of setup is not encouraged. A task inside cgroup +namespace should only be exposed to its own cgroupns hierarchy. + +setns(2) to another cgroup namespace is allowed when: + +(a) the process has CAP_SYS_ADMIN against its current user namespace +(b) the process has CAP_SYS_ADMIN against the target cgroup + namespace's userns + +No implicit cgroup changes happen with attaching to another cgroup +namespace. It is expected that the someone moves the attaching +process under the target cgroup namespace root. + + +Interaction with Other Namespaces +--------------------------------- + +Namespace specific cgroup hierarchy can be mounted by a process +running inside a non-init cgroup namespace:: + + # mount -t cgroup2 none $MOUNT_POINT + +This will mount the unified cgroup hierarchy with cgroupns root as the +filesystem root. The process needs CAP_SYS_ADMIN against its user and +mount namespaces. + +The virtualization of /proc/self/cgroup file combined with restricting +the view of cgroup hierarchy by namespace-private cgroupfs mount +provides a properly isolated cgroup view inside the container. + + +Information on Kernel Programming +================================= + +This section contains kernel programming information in the areas +where interacting with cgroup is necessary. cgroup core and +controllers are not covered. + + +Filesystem Support for Writeback +-------------------------------- + +A filesystem can support cgroup writeback by updating +address_space_operations->writepage[s]() to annotate bio's using the +following two functions. + + wbc_init_bio(@wbc, @bio) + Should be called for each bio carrying writeback data and + associates the bio with the inode's owner cgroup. Can be + called anytime between bio allocation and submission. + + wbc_account_io(@wbc, @page, @bytes) + Should be called for each data segment being written out. + While this function doesn't care exactly when it's called + during the writeback session, it's the easiest and most + natural to call it as data segments are added to a bio. + +With writeback bio's annotated, cgroup support can be enabled per +super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for +selective disabling of cgroup writeback support which is helpful when +certain filesystem features, e.g. journaled data mode, are +incompatible. + +wbc_init_bio() binds the specified bio to its cgroup. Depending on +the configuration, the bio may be executed at a lower priority and if +the writeback session is holding shared resources, e.g. a journal +entry, may lead to priority inversion. There is no one easy solution +for the problem. Filesystems can try to work around specific problem +cases by skipping wbc_init_bio() or using bio_associate_blkcg() +directly. + + +Deprecated v1 Core Features +=========================== + +- Multiple hierarchies including named ones are not supported. + +- All v1 mount options are not supported. + +- The "tasks" file is removed and "cgroup.procs" is not sorted. + +- "cgroup.clone_children" is removed. + +- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file + at the root instead. + + +Issues with v1 and Rationales for v2 +==================================== + +Multiple Hierarchies +-------------------- + +cgroup v1 allowed an arbitrary number of hierarchies and each +hierarchy could host any number of controllers. While this seemed to +provide a high level of flexibility, it wasn't useful in practice. + +For example, as there is only one instance of each controller, utility +type controllers such as freezer which can be useful in all +hierarchies could only be used in one. The issue is exacerbated by +the fact that controllers couldn't be moved to another hierarchy once +hierarchies were populated. Another issue was that all controllers +bound to a hierarchy were forced to have exactly the same view of the +hierarchy. It wasn't possible to vary the granularity depending on +the specific controller. + +In practice, these issues heavily limited which controllers could be +put on the same hierarchy and most configurations resorted to putting +each controller on its own hierarchy. Only closely related ones, such +as the cpu and cpuacct controllers, made sense to be put on the same +hierarchy. This often meant that userland ended up managing multiple +similar hierarchies repeating the same steps on each hierarchy +whenever a hierarchy management operation was necessary. + +Furthermore, support for multiple hierarchies came at a steep cost. +It greatly complicated cgroup core implementation but more importantly +the support for multiple hierarchies restricted how cgroup could be +used in general and what controllers was able to do. + +There was no limit on how many hierarchies there might be, which meant +that a thread's cgroup membership couldn't be described in finite +length. The key might contain any number of entries and was unlimited +in length, which made it highly awkward to manipulate and led to +addition of controllers which existed only to identify membership, +which in turn exacerbated the original problem of proliferating number +of hierarchies. + +Also, as a controller couldn't have any expectation regarding the +topologies of hierarchies other controllers might be on, each +controller had to assume that all other controllers were attached to +completely orthogonal hierarchies. This made it impossible, or at +least very cumbersome, for controllers to cooperate with each other. + +In most use cases, putting controllers on hierarchies which are +completely orthogonal to each other isn't necessary. What usually is +called for is the ability to have differing levels of granularity +depending on the specific controller. In other words, hierarchy may +be collapsed from leaf towards root when viewed from specific +controllers. For example, a given configuration might not care about +how memory is distributed beyond a certain level while still wanting +to control how CPU cycles are distributed. + + +Thread Granularity +------------------ + +cgroup v1 allowed threads of a process to belong to different cgroups. +This didn't make sense for some controllers and those controllers +ended up implementing different ways to ignore such situations but +much more importantly it blurred the line between API exposed to +individual applications and system management interface. + +Generally, in-process knowledge is available only to the process +itself; thus, unlike service-level organization of processes, +categorizing threads of a process requires active participation from +the application which owns the target process. + +cgroup v1 had an ambiguously defined delegation model which got abused +in combination with thread granularity. cgroups were delegated to +individual applications so that they can create and manage their own +sub-hierarchies and control resource distributions along them. This +effectively raised cgroup to the status of a syscall-like API exposed +to lay programs. + +First of all, cgroup has a fundamentally inadequate interface to be +exposed this way. For a process to access its own knobs, it has to +extract the path on the target hierarchy from /proc/self/cgroup, +construct the path by appending the name of the knob to the path, open +and then read and/or write to it. This is not only extremely clunky +and unusual but also inherently racy. There is no conventional way to +define transaction across the required steps and nothing can guarantee +that the process would actually be operating on its own sub-hierarchy. + +cgroup controllers implemented a number of knobs which would never be +accepted as public APIs because they were just adding control knobs to +system-management pseudo filesystem. cgroup ended up with interface +knobs which were not properly abstracted or refined and directly +revealed kernel internal details. These knobs got exposed to +individual applications through the ill-defined delegation mechanism +effectively abusing cgroup as a shortcut to implementing public APIs +without going through the required scrutiny. + +This was painful for both userland and kernel. Userland ended up with +misbehaving and poorly abstracted interfaces and kernel exposing and +locked into constructs inadvertently. + + +Competition Between Inner Nodes and Threads +------------------------------------------- + +cgroup v1 allowed threads to be in any cgroups which created an +interesting problem where threads belonging to a parent cgroup and its +children cgroups competed for resources. This was nasty as two +different types of entities competed and there was no obvious way to +settle it. Different controllers did different things. + +The cpu controller considered threads and cgroups as equivalents and +mapped nice levels to cgroup weights. This worked for some cases but +fell flat when children wanted to be allocated specific ratios of CPU +cycles and the number of internal threads fluctuated - the ratios +constantly changed as the number of competing entities fluctuated. +There also were other issues. The mapping from nice level to weight +wasn't obvious or universal, and there were various other knobs which +simply weren't available for threads. + +The io controller implicitly created a hidden leaf node for each +cgroup to host the threads. The hidden leaf had its own copies of all +the knobs with ``leaf_`` prefixed. While this allowed equivalent +control over internal threads, it was with serious drawbacks. It +always added an extra layer of nesting which wouldn't be necessary +otherwise, made the interface messy and significantly complicated the +implementation. + +The memory controller didn't have a way to control what happened +between internal tasks and child cgroups and the behavior was not +clearly defined. There were attempts to add ad-hoc behaviors and +knobs to tailor the behavior to specific workloads which would have +led to problems extremely difficult to resolve in the long term. + +Multiple controllers struggled with internal tasks and came up with +different ways to deal with it; unfortunately, all the approaches were +severely flawed and, furthermore, the widely different behaviors +made cgroup as a whole highly inconsistent. + +This clearly is a problem which needs to be addressed from cgroup core +in a uniform way. + + +Other Interface Issues +---------------------- + +cgroup v1 grew without oversight and developed a large number of +idiosyncrasies and inconsistencies. One issue on the cgroup core side +was how an empty cgroup was notified - a userland helper binary was +forked and executed for each event. The event delivery wasn't +recursive or delegatable. The limitations of the mechanism also led +to in-kernel event delivery filtering mechanism further complicating +the interface. + +Controller interfaces were problematic too. An extreme example is +controllers completely ignoring hierarchical organization and treating +all cgroups as if they were all located directly under the root +cgroup. Some controllers exposed a large amount of inconsistent +implementation details to userland. + +There also was no consistency across controllers. When a new cgroup +was created, some controllers defaulted to not imposing extra +restrictions while others disallowed any resource usage until +explicitly configured. Configuration knobs for the same type of +control used widely differing naming schemes and formats. Statistics +and information knobs were named arbitrarily and used different +formats and units even in the same controller. + +cgroup v2 establishes common conventions where appropriate and updates +controllers so that they expose minimal and consistent interfaces. + + +Controller Issues and Remedies +------------------------------ + +Memory +~~~~~~ + +The original lower boundary, the soft limit, is defined as a limit +that is per default unset. As a result, the set of cgroups that +global reclaim prefers is opt-in, rather than opt-out. The costs for +optimizing these mostly negative lookups are so high that the +implementation, despite its enormous size, does not even provide the +basic desirable behavior. First off, the soft limit has no +hierarchical meaning. All configured groups are organized in a global +rbtree and treated like equal peers, regardless where they are located +in the hierarchy. This makes subtree delegation impossible. Second, +the soft limit reclaim pass is so aggressive that it not just +introduces high allocation latencies into the system, but also impacts +system performance due to overreclaim, to the point where the feature +becomes self-defeating. + +The memory.low boundary on the other hand is a top-down allocated +reserve. A cgroup enjoys reclaim protection when it's within its low, +which makes delegation of subtrees possible. + +The original high boundary, the hard limit, is defined as a strict +limit that can not budge, even if the OOM killer has to be called. +But this generally goes against the goal of making the most out of the +available memory. The memory consumption of workloads varies during +runtime, and that requires users to overcommit. But doing that with a +strict upper limit requires either a fairly accurate prediction of the +working set size or adding slack to the limit. Since working set size +estimation is hard and error prone, and getting it wrong results in +OOM kills, most users tend to err on the side of a looser limit and +end up wasting precious resources. + +The memory.high boundary on the other hand can be set much more +conservatively. When hit, it throttles allocations by forcing them +into direct reclaim to work off the excess, but it never invokes the +OOM killer. As a result, a high boundary that is chosen too +aggressively will not terminate the processes, but instead it will +lead to gradual performance degradation. The user can monitor this +and make corrections until the minimal memory footprint that still +gives acceptable performance is found. + +In extreme cases, with many concurrent allocations and a complete +breakdown of reclaim progress within the group, the high boundary can +be exceeded. But even then it's mostly better to satisfy the +allocation from the slack available in other groups or the rest of the +system than killing the group. Otherwise, memory.max is there to +limit this type of spillover and ultimately contain buggy or even +malicious applications. + +Setting the original memory.limit_in_bytes below the current usage was +subject to a race condition, where concurrent charges could cause the +limit setting to fail. memory.max on the other hand will first set the +limit to prevent new charges, and then reclaim and OOM kill until the +new limit is met - or the task writing to memory.max is killed. + +The combined memory+swap accounting and limiting is replaced by real +control over swap space. + +The main argument for a combined memory+swap facility in the original +cgroup design was that global or parental pressure would always be +able to swap all anonymous memory of a child group, regardless of the +child's own (possibly untrusted) configuration. However, untrusted +groups can sabotage swapping by other means - such as referencing its +anonymous memory in a tight loop - and an admin can not assume full +swappability when overcommitting untrusted jobs. + +For trusted jobs, on the other hand, a combined counter is not an +intuitive userspace interface, and it flies in the face of the idea +that cgroup controllers should account and limit specific physical +resources. Swap space is a resource like all others in the system, +and that's why unified hierarchy allows distributing it separately. diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 78f8f00c369f22795016f875a3ace7db9c48950f..0873685bab0fc278b1c7938ab588796ff253f6ef 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -57,6 +57,7 @@ configure specific aspects of kernel behavior to your liking. :maxdepth: 1 initrd + cgroup-v2 serial-console braille-console parport @@ -69,9 +70,11 @@ configure specific aspects of kernel behavior to your liking. mono java ras + bcache pm/index thunderbolt LSM/index + mm/index .. only:: subproject and html diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index d7dd58ccf0d4650836f7be07eb040e621534bbe1..1370b424a45348a0f517b4427ddea1aa2ea9f80c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -106,11 +106,11 @@ use by PCI Format: ,... - acpi_mask_gpe= [HW,ACPI] + acpi_mask_gpe= [HW,ACPI] Due to the existence of _Lxx/_Exx, some GPEs triggered by unsupported hardware/firmware features can result in - GPE floodings that cannot be automatically disabled by - the GPE dispatcher. + GPE floodings that cannot be automatically disabled by + the GPE dispatcher. This facility can be used to prevent such uncontrolled GPE floodings. Format: @@ -256,7 +256,7 @@ (may crash computer or cause data corruption) ALSA [HW,ALSA] - See Documentation/sound/alsa/alsa-parameters.txt + See Documentation/sound/alsa-configuration.rst alignment= [KNL,ARM] Allow the default userspace alignment fault handler @@ -472,10 +472,10 @@ for platform specific values (SB1, Loongson3 and others). - ccw_timeout_log [S390] + ccw_timeout_log [S390] See Documentation/s390/CommonIO for details. - cgroup_disable= [KNL] Disable a particular controller + cgroup_disable= [KNL] Disable a particular controller Format: {name of the controller(s) to disable} The effects of cgroup_disable=foo are: - foo isn't auto-mounted if you mount all cgroups in @@ -518,7 +518,7 @@ those clocks in any way. This parameter is useful for debug and development, but should not be needed on a platform with proper driver support. For more - information, see Documentation/clk.txt. + information, see Documentation/driver-api/clk.rst. clock= [BUGS=X86-32, HW] gettimeofday clocksource override. [Deprecated] @@ -587,11 +587,6 @@ Sets the size of memory pool for coherent, atomic dma allocations, by default set to 256K. - code_bytes [X86] How many bytes of object code to print - in an oops report. - Range: 0 - 8192 - Default: 64 - com20020= [HW,NET] ARCnet - COM20020 chipset Format: [,[,[,[,[,]]]]] @@ -641,8 +636,8 @@ hvc Use the hypervisor console device . This is for both Xen and PowerPC hypervisors. - If the device connected to the port is not a TTY but a braille - device, prepend "brl," before the device type, for instance + If the device connected to the port is not a TTY but a braille + device, prepend "brl," before the device type, for instance console=brl,ttyS0 For now, only VisioBraille is supported. @@ -662,7 +657,7 @@ consoleblank= [KNL] The console blank (screen saver) timeout in seconds. A value of 0 disables the blank timer. - Defaults to 0. + Defaults to 0. coredump_filter= [KNL] Change the default value for @@ -730,7 +725,7 @@ or memory reserved is below 4G. cryptomgr.notests - [KNL] Disable crypto self-tests + [KNL] Disable crypto self-tests cs89x0_dma= [HW,NET] Format: @@ -746,7 +741,7 @@ Format: , See also Documentation/input/devices/joystick-parport.rst - ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot + ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot time. See Documentation/admin-guide/dynamic-debug-howto.rst for details. Deprecated, see dyndbg. @@ -833,7 +828,7 @@ causing system reset or hang due to sending INIT from AP to BSP. - disable_ddw [PPC/PSERIES] + disable_ddw [PPC/PSERIES] Disable Dynamic DMA Window support. Use this if to workaround buggy firmware. @@ -1025,6 +1020,12 @@ address. The serial port must already be setup and configured. Options are not yet supported. + qcom_geni, + Start an early, polled-mode console on a Qualcomm + Generic Interface (GENI) based serial port at the + specified address. The serial port must already be + setup and configured. Options are not yet supported. + earlyprintk= [X86,SH,ARM,M68k,S390] earlyprintk=vga earlyprintk=efi @@ -1188,7 +1189,7 @@ parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. - forcepae [X86-32] + forcepae [X86-32] Forcefully enable Physical Address Extension (PAE). Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. @@ -1247,7 +1248,7 @@ gamma= [HW,DRM] - gart_fix_e820= [X86_64] disable the fix e820 for K8 GART + gart_fix_e820= [X86_64] disable the fix e820 for K8 GART Format: off | on default: on @@ -1341,23 +1342,32 @@ x86-64 are 2M (when the CPU supports "pse") and 1G (when the CPU supports the "pdpe1gb" cpuinfo flag). - hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC) - terminal devices. Valid values: 0..8 - hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs. - If specified, z/VM IUCV HVC accepts connections - from listed z/VM user IDs only. + hung_task_panic= + [KNL] Should the hung task detector generate panics. + Format: + A nonzero value instructs the kernel to panic when a + hung task is detected. The default value is controlled + by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time + option. The value selected by this boot parameter can + be changed later by the kernel.hung_task_panic sysctl. + + hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC) + terminal devices. Valid values: 0..8 + hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs. + If specified, z/VM IUCV HVC accepts connections + from listed z/VM user IDs only. keep_bootcon [KNL] Do not unregister boot console at start. This is only useful for debugging when something happens in the window between unregistering the boot console and initializing the real console. - i2c_bus= [HW] Override the default board specific I2C bus speed - or register an additional I2C bus that is not - registered from board initialization code. - Format: - , + i2c_bus= [HW] Override the default board specific I2C bus speed + or register an additional I2C bus that is not + registered from board initialization code. + Format: + , i8042.debug [HW] Toggle i8042 debug mode i8042.unmask_kbd_data @@ -1386,7 +1396,7 @@ Default: only on s2r transitions on x86; most other architectures force reset to be always executed i8042.unlock [HW] Unlock (ignore) the keylock - i8042.kbdreset [HW] Reset device connected to KBD port + i8042.kbdreset [HW] Reset device connected to KBD port i810= [HW,DRM] @@ -1548,13 +1558,13 @@ programs exec'd, files mmap'd for exec, and all files opened for read by uid=0. - ima_template= [IMA] + ima_template= [IMA] Select one of defined IMA measurements template formats. Formats: { "ima" | "ima-ng" | "ima-sig" } Default: "ima-ng" ima_template_fmt= - [IMA] Define a custom template format. + [IMA] Define a custom template format. Format: { "field1|...|fieldN" } ima.ahash_minsize= [IMA] Minimum file size for asynchronous hash usage @@ -1597,7 +1607,7 @@ inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver Format: - int_pln_enable [x86] Enable power limit notification interrupt + int_pln_enable [x86] Enable power limit notification interrupt integrity_audit=[IMA] Format: { "0" | "1" } @@ -1650,39 +1660,39 @@ 0 disables intel_idle and fall back on acpi_idle. 1 to 9 specify maximum depth of C-state. - intel_pstate= [X86] - disable - Do not enable intel_pstate as the default - scaling driver for the supported processors - passive - Use intel_pstate as a scaling driver, but configure it - to work with generic cpufreq governors (instead of - enabling its internal governor). This mode cannot be - used along with the hardware-managed P-states (HWP) - feature. - force - Enable intel_pstate on systems that prohibit it by default - in favor of acpi-cpufreq. Forcing the intel_pstate driver - instead of acpi-cpufreq may disable platform features, such - as thermal controls and power capping, that rely on ACPI - P-States information being indicated to OSPM and therefore - should be used with caution. This option does not work with - processors that aren't supported by the intel_pstate driver - or on platforms that use pcc-cpufreq instead of acpi-cpufreq. - no_hwp - Do not enable hardware P state control (HWP) - if available. - hwp_only - Only load intel_pstate on systems which support - hardware P state control (HWP) if available. - support_acpi_ppc - Enforce ACPI _PPC performance limits. If the Fixed ACPI - Description Table, specifies preferred power management - profile as "Enterprise Server" or "Performance Server", - then this feature is turned on by default. - per_cpu_perf_limits - Allow per-logical-CPU P-State performance control limits using - cpufreq sysfs interface + intel_pstate= [X86] + disable + Do not enable intel_pstate as the default + scaling driver for the supported processors + passive + Use intel_pstate as a scaling driver, but configure it + to work with generic cpufreq governors (instead of + enabling its internal governor). This mode cannot be + used along with the hardware-managed P-states (HWP) + feature. + force + Enable intel_pstate on systems that prohibit it by default + in favor of acpi-cpufreq. Forcing the intel_pstate driver + instead of acpi-cpufreq may disable platform features, such + as thermal controls and power capping, that rely on ACPI + P-States information being indicated to OSPM and therefore + should be used with caution. This option does not work with + processors that aren't supported by the intel_pstate driver + or on platforms that use pcc-cpufreq instead of acpi-cpufreq. + no_hwp + Do not enable hardware P state control (HWP) + if available. + hwp_only + Only load intel_pstate on systems which support + hardware P state control (HWP) if available. + support_acpi_ppc + Enforce ACPI _PPC performance limits. If the Fixed ACPI + Description Table, specifies preferred power management + profile as "Enterprise Server" or "Performance Server", + then this feature is turned on by default. + per_cpu_perf_limits + Allow per-logical-CPU P-State performance control limits using + cpufreq sysfs interface intremap= [X86-64, Intel-IOMMU] on enable Interrupt Remapping (default) @@ -1705,7 +1715,6 @@ nopanic merge nomerge - forcesac soft pt [x86, IA-64] nobypass [PPC/POWERNV] @@ -2101,7 +2110,7 @@ * [no]ncqtrim: Turn off queued DSM TRIM. * nohrst, nosrst, norst: suppress hard, soft - and both resets. + and both resets. * rstonce: only attempt one reset during hot-unplug link recovery @@ -2289,7 +2298,7 @@ [KNL,SH] Allow user to override the default size for per-device physically contiguous DMA buffers. - memhp_default_state=online/offline + memhp_default_state=online/offline [KNL] Set the initial state for the memory hotplug onlining policy. If not specified, the default value is set according to the @@ -2674,6 +2683,9 @@ emulation library even if a 387 maths coprocessor is present. + no5lvl [X86-64] Disable 5-level paging mode. Forces + kernel to use 4-level paging instead. + no_console_suspend [HW] Never suspend the console Disable suspending of consoles during suspend and @@ -2843,7 +2855,7 @@ [X86,PV_OPS] Disable paravirtualized VMware scheduler clock and use the default one. - no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting. + no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting. steal time is computed, but won't influence scheduler behaviour @@ -2904,7 +2916,7 @@ notsc [BUGS=X86-32] Disable Time Stamp Counter nowatchdog [KNL] Disable both lockup detectors, i.e. - soft-lockup and NMI watchdog (hard-lockup). + soft-lockup and NMI watchdog (hard-lockup). nowb [ARM] @@ -2924,7 +2936,7 @@ If the dependencies are under your control, you can turn on cpu0_hotplug. - nps_mtm_hs_ctr= [KNL,ARC] + nps_mtm_hs_ctr= [KNL,ARC] This parameter sets the maximum duration, in cycles, each HW thread of the CTOP can run without interruptions, before HW switches it. @@ -2992,9 +3004,6 @@ This will also cause panics on machine check exceptions. Useful together with panic=30 to trigger a reboot. - OSS [HW,OSS] - See Documentation/sound/oss/oss-parameters.txt - page_owner= [KNL] Boot-time page_owner enabling option. Storage of the information about who allocated each page is disabled in default. With this switch, @@ -3065,7 +3074,7 @@ pci=option[,option...] [PCI] various PCI subsystem options: earlydump [X86] dump PCI config space before the kernel - changes anything + changes anything off [X86] don't probe for the PCI bus bios [X86-32] force use of PCI BIOS, don't access the hardware directly. Use this if your machine @@ -3153,7 +3162,7 @@ is enabled by default. If you need to use this, please report a bug. nocrs [X86] Ignore PCI host bridge windows from ACPI. - If you need to use this, please report a bug. + If you need to use this, please report a bug. routeirq Do IRQ routing for all PCI devices. This is normally done in pci_enable_device(), so this option is a temporary workaround @@ -3228,6 +3237,8 @@ on: Turn realloc on realloc same as realloc=on noari do not use PCIe ARI. + noats [PCIE, Intel-IOMMU, AMD-IOMMU] + do not use PCIe ATS (and IOMMU device IOTLB). pcie_scan_all Scan all possible PCIe devices. Otherwise we only look for one device below a PCIe downstream port. @@ -3996,7 +4007,7 @@ cache (risks via metadata attacks are mostly unchanged). Debug options disable merging on their own. - For more information see Documentation/vm/slub.txt. + For more information see Documentation/vm/slub.rst. slab_max_order= [MM, SLAB] Determines the maximum allowed order for slabs. @@ -4010,7 +4021,7 @@ slub_debug can create guard zones around objects and may poison objects when not in use. Also tracks the last alloc / free. For more information see - Documentation/vm/slub.txt. + Documentation/vm/slub.rst. slub_memcg_sysfs= [MM, SLUB] Determines whether to enable sysfs directories for @@ -4024,7 +4035,7 @@ Determines the maximum allowed order for slabs. A high setting may cause OOMs due to memory fragmentation. For more information see - Documentation/vm/slub.txt. + Documentation/vm/slub.rst. slub_min_objects= [MM, SLUB] The minimum number of objects per slab. SLUB will @@ -4033,12 +4044,12 @@ the number of objects indicated. The higher the number of objects the smaller the overhead of tracking slabs and the less frequently locks need to be acquired. - For more information see Documentation/vm/slub.txt. + For more information see Documentation/vm/slub.rst. slub_min_order= [MM, SLUB] Determines the minimum page order for slabs. Must be lower than slub_max_order. - For more information see Documentation/vm/slub.txt. + For more information see Documentation/vm/slub.rst. slub_nomerge [MM, SLUB] Same with slab_nomerge. This is supported for legacy. @@ -4399,7 +4410,7 @@ [FTRACE] Set and start specified trace events in order to facilitate early boot debugging. The event-list is a comma separated list of trace events to enable. See - also Documentation/trace/events.txt + also Documentation/trace/events.rst trace_options=[option-list] [FTRACE] Enable or disable tracer options at boot. @@ -4414,7 +4425,7 @@ trace_options=stacktrace - See also Documentation/trace/ftrace.txt "trace options" + See also Documentation/trace/ftrace.rst "trace options" section. tp_printk[FTRACE] @@ -4453,7 +4464,8 @@ Format: [always|madvise|never] Can be used to control the default behavior of the system with respect to transparent hugepages. - See Documentation/vm/transhuge.txt for more details. + See Documentation/admin-guide/mm/transhuge.rst + for more details. tsc= Disable clocksource stability checks for TSC. Format: @@ -4531,7 +4543,7 @@ usbcore.initial_descriptor_timeout= [USB] Specifies timeout for the initial 64-byte - USB_REQ_GET_DESCRIPTOR request in milliseconds + USB_REQ_GET_DESCRIPTOR request in milliseconds (default 5000 = 5.0 seconds). usbcore.nousb [USB] Disable the USB subsystem @@ -4912,3 +4924,8 @@ xirc2ps_cs= [NET,PCMCIA] Format: ,,,,,[,[,[,]]] + + xhci-hcd.quirks [USB,KNL] + A hex value specifying bitmask with supplemental xhci + host controller quirks. Meaning of each bit can be + consulted in header drivers/usb/host/xhci.h. diff --git a/Documentation/admin-guide/mm/concepts.rst b/Documentation/admin-guide/mm/concepts.rst new file mode 100644 index 0000000000000000000000000000000000000000..291699c810d4e7d99c9c803b01c514b0f63928ea --- /dev/null +++ b/Documentation/admin-guide/mm/concepts.rst @@ -0,0 +1,222 @@ +.. _mm_concepts: + +================= +Concepts overview +================= + +The memory management in Linux is complex system that evolved over the +years and included more and more functionality to support variety of +systems from MMU-less microcontrollers to supercomputers. The memory +management for systems without MMU is called ``nommu`` and it +definitely deserves a dedicated document, which hopefully will be +eventually written. Yet, although some of the concepts are the same, +here we assume that MMU is available and CPU can translate a virtual +address to a physical address. + +.. contents:: :local: + +Virtual Memory Primer +===================== + +The physical memory in a computer system is a limited resource and +even for systems that support memory hotplug there is a hard limit on +the amount of memory that can be installed. The physical memory is not +necessary contiguous, it might be accessible as a set of distinct +address ranges. Besides, different CPU architectures, and even +different implementations of the same architecture have different view +how these address ranges defined. + +All this makes dealing directly with physical memory quite complex and +to avoid this complexity a concept of virtual memory was developed. + +The virtual memory abstracts the details of physical memory from the +application software, allows to keep only needed information in the +physical memory (demand paging) and provides a mechanism for the +protection and controlled sharing of data between processes. + +With virtual memory, each and every memory access uses a virtual +address. When the CPU decodes the an instruction that reads (or +writes) from (or to) the system memory, it translates the `virtual` +address encoded in that instruction to a `physical` address that the +memory controller can understand. + +The physical system memory is divided into page frames, or pages. The +size of each page is architecture specific. Some architectures allow +selection of the page size from several supported values; this +selection is performed at the kernel build time by setting an +appropriate kernel configuration option. + +Each physical memory page can be mapped as one or more virtual +pages. These mappings are described by page tables that allow +translation from virtual address used by programs to real address in +the physical memory. The page tables organized hierarchically. + +The tables at the lowest level of the hierarchy contain physical +addresses of actual pages used by the software. The tables at higher +levels contain physical addresses of the pages belonging to the lower +levels. The pointer to the top level page table resides in a +register. When the CPU performs the address translation, it uses this +register to access the top level page table. The high bits of the +virtual address are used to index an entry in the top level page +table. That entry is then used to access the next level in the +hierarchy with the next bits of the virtual address as the index to +that level page table. The lowest bits in the virtual address define +the offset inside the actual page. + +Huge Pages +========== + +The address translation requires several memory accesses and memory +accesses are slow relatively to CPU speed. To avoid spending precious +processor cycles on the address translation, CPUs maintain a cache of +such translations called Translation Lookaside Buffer (or +TLB). Usually TLB is pretty scarce resource and applications with +large memory working set will experience performance hit because of +TLB misses. + +Many modern CPU architectures allow mapping of the memory pages +directly by the higher levels in the page table. For instance, on x86, +it is possible to map 2M and even 1G pages using entries in the second +and the third level page tables. In Linux such pages are called +`huge`. Usage of huge pages significantly reduces pressure on TLB, +improves TLB hit-rate and thus improves overall system performance. + +There are two mechanisms in Linux that enable mapping of the physical +memory with the huge pages. The first one is `HugeTLB filesystem`, or +hugetlbfs. It is a pseudo filesystem that uses RAM as its backing +store. For the files created in this filesystem the data resides in +the memory and mapped using huge pages. The hugetlbfs is described at +:ref:`Documentation/admin-guide/mm/hugetlbpage.rst `. + +Another, more recent, mechanism that enables use of the huge pages is +called `Transparent HugePages`, or THP. Unlike the hugetlbfs that +requires users and/or system administrators to configure what parts of +the system memory should and can be mapped by the huge pages, THP +manages such mappings transparently to the user and hence the +name. See +:ref:`Documentation/admin-guide/mm/transhuge.rst ` +for more details about THP. + +Zones +===== + +Often hardware poses restrictions on how different physical memory +ranges can be accessed. In some cases, devices cannot perform DMA to +all the addressable memory. In other cases, the size of the physical +memory exceeds the maximal addressable size of virtual memory and +special actions are required to access portions of the memory. Linux +groups memory pages into `zones` according to their possible +usage. For example, ZONE_DMA will contain memory that can be used by +devices for DMA, ZONE_HIGHMEM will contain memory that is not +permanently mapped into kernel's address space and ZONE_NORMAL will +contain normally addressed pages. + +The actual layout of the memory zones is hardware dependent as not all +architectures define all zones, and requirements for DMA are different +for different platforms. + +Nodes +===== + +Many multi-processor machines are NUMA - Non-Uniform Memory Access - +systems. In such systems the memory is arranged into banks that have +different access latency depending on the "distance" from the +processor. Each bank is referred as `node` and for each node Linux +constructs an independent memory management subsystem. A node has it's +own set of zones, lists of free and used pages and various statistics +counters. You can find more details about NUMA in +:ref:`Documentation/vm/numa.rst ` and in +:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst `. + +Page cache +========== + +The physical memory is volatile and the common case for getting data +into the memory is to read it from files. Whenever a file is read, the +data is put into the `page cache` to avoid expensive disk access on +the subsequent reads. Similarly, when one writes to a file, the data +is placed in the page cache and eventually gets into the backing +storage device. The written pages are marked as `dirty` and when Linux +decides to reuse them for other purposes, it makes sure to synchronize +the file contents on the device with the updated data. + +Anonymous Memory +================ + +The `anonymous memory` or `anonymous mappings` represent memory that +is not backed by a filesystem. Such mappings are implicitly created +for program's stack and heap or by explicit calls to mmap(2) system +call. Usually, the anonymous mappings only define virtual memory areas +that the program is allowed to access. The read accesses will result +in creation of a page table entry that references a special physical +page filled with zeroes. When the program performs a write, regular +physical page will be allocated to hold the written data. The page +will be marked dirty and if the kernel will decide to repurpose it, +the dirty page will be swapped out. + +Reclaim +======= + +Throughout the system lifetime, a physical page can be used for storing +different types of data. It can be kernel internal data structures, +DMA'able buffers for device drivers use, data read from a filesystem, +memory allocated by user space processes etc. + +Depending on the page usage it is treated differently by the Linux +memory management. The pages that can be freed at any time, either +because they cache the data available elsewhere, for instance, on a +hard disk, or because they can be swapped out, again, to the hard +disk, are called `reclaimable`. The most notable categories of the +reclaimable pages are page cache and anonymous memory. + +In most cases, the pages holding internal kernel data and used as DMA +buffers cannot be repurposed, and they remain pinned until freed by +their user. Such pages are called `unreclaimable`. However, in certain +circumstances, even pages occupied with kernel data structures can be +reclaimed. For instance, in-memory caches of filesystem metadata can +be re-read from the storage device and therefore it is possible to +discard them from the main memory when system is under memory +pressure. + +The process of freeing the reclaimable physical memory pages and +repurposing them is called (surprise!) `reclaim`. Linux can reclaim +pages either asynchronously or synchronously, depending on the state +of the system. When system is not loaded, most of the memory is free +and allocation request will be satisfied immediately from the free +pages supply. As the load increases, the amount of the free pages goes +down and when it reaches a certain threshold (high watermark), an +allocation request will awaken the ``kswapd`` daemon. It will +asynchronously scan memory pages and either just free them if the data +they contain is available elsewhere, or evict to the backing storage +device (remember those dirty pages?). As memory usage increases even +more and reaches another threshold - min watermark - an allocation +will trigger the `direct reclaim`. In this case allocation is stalled +until enough memory pages are reclaimed to satisfy the request. + +Compaction +========== + +As the system runs, tasks allocate and free the memory and it becomes +fragmented. Although with virtual memory it is possible to present +scattered physical pages as virtually contiguous range, sometimes it is +necessary to allocate large physically contiguous memory areas. Such +need may arise, for instance, when a device driver requires large +buffer for DMA, or when THP allocates a huge page. Memory `compaction` +addresses the fragmentation issue. This mechanism moves occupied pages +from the lower part of a memory zone to free pages in the upper part +of the zone. When a compaction scan is finished free pages are grouped +together at the beginning of the zone and allocations of large +physically contiguous areas become possible. + +Like reclaim, the compaction may happen asynchronously in ``kcompactd`` +daemon or synchronously as a result of memory allocation request. + +OOM killer +========== + +It may happen, that on a loaded machine memory will be exhausted. When +the kernel detects that the system runs out of memory (OOM) it invokes +`OOM killer`. Its mission is simple: all it has to do is to select a +task to sacrifice for the sake of the overall system health. The +selected task is killed in a hope that after it exits enough memory +will be freed to continue normal operation. diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst new file mode 100644 index 0000000000000000000000000000000000000000..1cc0bc78d10e525d541352ea189c9c925bb85991 --- /dev/null +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -0,0 +1,382 @@ +.. _hugetlbpage: + +============= +HugeTLB Pages +============= + +Overview +======== + +The intent of this file is to give a brief summary of hugetlbpage support in +the Linux kernel. This support is built on top of multiple page size support +that is provided by most modern architectures. For example, x86 CPUs normally +support 4K and 2M (1G if architecturally supported) page sizes, ia64 +architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, +256M and ppc64 supports 4K and 16M. A TLB is a cache of virtual-to-physical +translations. Typically this is a very scarce resource on processor. +Operating systems try to make best use of limited number of TLB resources. +This optimization is more critical now as bigger and bigger physical memories +(several GBs) are more readily available. + +Users can use the huge page support in Linux kernel by either using the mmap +system call or standard SYSV shared memory system calls (shmget, shmat). + +First the Linux kernel needs to be built with the CONFIG_HUGETLBFS +(present under "File systems") and CONFIG_HUGETLB_PAGE (selected +automatically when CONFIG_HUGETLBFS is selected) configuration +options. + +The ``/proc/meminfo`` file provides information about the total number of +persistent hugetlb pages in the kernel's huge page pool. It also displays +default huge page size and information about the number of free, reserved +and surplus huge pages in the pool of huge pages of default size. +The huge page size is needed for generating the proper alignment and +size of the arguments to system calls that map huge page regions. + +The output of ``cat /proc/meminfo`` will include lines like:: + + HugePages_Total: uuu + HugePages_Free: vvv + HugePages_Rsvd: www + HugePages_Surp: xxx + Hugepagesize: yyy kB + Hugetlb: zzz kB + +where: + +HugePages_Total + is the size of the pool of huge pages. +HugePages_Free + is the number of huge pages in the pool that are not yet + allocated. +HugePages_Rsvd + is short for "reserved," and is the number of huge pages for + which a commitment to allocate from the pool has been made, + but no allocation has yet been made. Reserved huge pages + guarantee that an application will be able to allocate a + huge page from the pool of huge pages at fault time. +HugePages_Surp + is short for "surplus," and is the number of huge pages in + the pool above the value in ``/proc/sys/vm/nr_hugepages``. The + maximum number of surplus huge pages is controlled by + ``/proc/sys/vm/nr_overcommit_hugepages``. +Hugepagesize + is the default hugepage size (in Kb). +Hugetlb + is the total amount of memory (in kB), consumed by huge + pages of all sizes. + If huge pages of different sizes are in use, this number + will exceed HugePages_Total \* Hugepagesize. To get more + detailed information, please, refer to + ``/sys/kernel/mm/hugepages`` (described below). + + +``/proc/filesystems`` should also show a filesystem of type "hugetlbfs" +configured in the kernel. + +``/proc/sys/vm/nr_hugepages`` indicates the current number of "persistent" huge +pages in the kernel's huge page pool. "Persistent" huge pages will be +returned to the huge page pool when freed by a task. A user with root +privileges can dynamically allocate more or free some persistent huge pages +by increasing or decreasing the value of ``nr_hugepages``. + +Pages that are used as huge pages are reserved inside the kernel and cannot +be used for other purposes. Huge pages cannot be swapped out under +memory pressure. + +Once a number of huge pages have been pre-allocated to the kernel huge page +pool, a user with appropriate privilege can use either the mmap system call +or shared memory system calls to use the huge pages. See the discussion of +:ref:`Using Huge Pages `, below. + +The administrator can allocate persistent huge pages on the kernel boot +command line by specifying the "hugepages=N" parameter, where 'N' = the +number of huge pages requested. This is the most reliable method of +allocating huge pages as memory has not yet become fragmented. + +Some platforms support multiple huge page sizes. To allocate huge pages +of a specific size, one must precede the huge pages boot command parameters +with a huge page size selection parameter "hugepagesz=". must +be specified in bytes with optional scale suffix [kKmMgG]. The default huge +page size may be selected with the "default_hugepagesz=" boot parameter. + +When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages`` +indicates the current number of pre-allocated huge pages of the default size. +Thus, one can use the following command to dynamically allocate/deallocate +default sized persistent huge pages:: + + echo 20 > /proc/sys/vm/nr_hugepages + +This command will try to adjust the number of default sized huge pages in the +huge page pool to 20, allocating or freeing huge pages, as required. + +On a NUMA platform, the kernel will attempt to distribute the huge page pool +over all the set of allowed nodes specified by the NUMA memory policy of the +task that modifies ``nr_hugepages``. The default for the allowed nodes--when the +task has default memory policy--is all on-line nodes with memory. Allowed +nodes with insufficient available, contiguous memory for a huge page will be +silently skipped when allocating persistent huge pages. See the +:ref:`discussion below ` +of the interaction of task memory policy, cpusets and per node attributes +with the allocation and freeing of persistent huge pages. + +The success or failure of huge page allocation depends on the amount of +physically contiguous memory that is present in system at the time of the +allocation attempt. If the kernel is unable to allocate huge pages from +some nodes in a NUMA system, it will attempt to make up the difference by +allocating extra pages on other nodes with sufficient available contiguous +memory, if any. + +System administrators may want to put this command in one of the local rc +init files. This will enable the kernel to allocate huge pages early in +the boot process when the possibility of getting physical contiguous pages +is still very high. Administrators can verify the number of huge pages +actually allocated by checking the sysctl or meminfo. To check the per node +distribution of huge pages in a NUMA system, use:: + + cat /sys/devices/system/node/node*/meminfo | fgrep Huge + +``/proc/sys/vm/nr_overcommit_hugepages`` specifies how large the pool of +huge pages can grow, if more huge pages than ``/proc/sys/vm/nr_hugepages`` are +requested by applications. Writing any non-zero value into this file +indicates that the hugetlb subsystem is allowed to try to obtain that +number of "surplus" huge pages from the kernel's normal page pool, when the +persistent huge page pool is exhausted. As these surplus huge pages become +unused, they are freed back to the kernel's normal page pool. + +When increasing the huge page pool size via ``nr_hugepages``, any existing +surplus pages will first be promoted to persistent huge pages. Then, additional +huge pages will be allocated, if necessary and if possible, to fulfill +the new persistent huge page pool size. + +The administrator may shrink the pool of persistent huge pages for +the default huge page size by setting the ``nr_hugepages`` sysctl to a +smaller value. The kernel will attempt to balance the freeing of huge pages +across all nodes in the memory policy of the task modifying ``nr_hugepages``. +Any free huge pages on the selected nodes will be freed back to the kernel's +normal page pool. + +Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` such that +it becomes less than the number of huge pages in use will convert the balance +of the in-use huge pages to surplus huge pages. This will occur even if +the number of surplus pages would exceed the overcommit value. As long as +this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is +increased sufficiently, or the surplus huge pages go out of use and are freed-- +no more surplus huge pages will be allowed to be allocated. + +With support for multiple huge page pools at run-time available, much of +the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in +sysfs. +The ``/proc`` interfaces discussed above have been retained for backwards +compatibility. The root huge page control directory in sysfs is:: + + /sys/kernel/mm/hugepages + +For each huge page size supported by the running kernel, a subdirectory +will exist, of the form:: + + hugepages-${size}kB + +Inside each of these directories, the same set of files will exist:: + + nr_hugepages + nr_hugepages_mempolicy + nr_overcommit_hugepages + free_hugepages + resv_hugepages + surplus_hugepages + +which function as described above for the default huge page-sized case. + +.. _mem_policy_and_hp_alloc: + +Interaction of Task Memory Policy with Huge Page Allocation/Freeing +=================================================================== + +Whether huge pages are allocated and freed via the ``/proc`` interface or +the ``/sysfs`` interface using the ``nr_hugepages_mempolicy`` attribute, the +NUMA nodes from which huge pages are allocated or freed are controlled by the +NUMA memory policy of the task that modifies the ``nr_hugepages_mempolicy`` +sysctl or attribute. When the ``nr_hugepages`` attribute is used, mempolicy +is ignored. + +The recommended method to allocate or free huge pages to/from the kernel +huge page pool, using the ``nr_hugepages`` example above, is:: + + numactl --interleave echo 20 \ + >/proc/sys/vm/nr_hugepages_mempolicy + +or, more succinctly:: + + numactl -m echo 20 >/proc/sys/vm/nr_hugepages_mempolicy + +This will allocate or free ``abs(20 - nr_hugepages)`` to or from the nodes +specified in , depending on whether number of persistent huge pages +is initially less than or greater than 20, respectively. No huge pages will be +allocated nor freed on any node not included in the specified . + +When adjusting the persistent hugepage count via ``nr_hugepages_mempolicy``, any +memory policy mode--bind, preferred, local or interleave--may be used. The +resulting effect on persistent huge page allocation is as follows: + +#. Regardless of mempolicy mode [see + :ref:`Documentation/admin-guide/mm/numa_memory_policy.rst `], + persistent huge pages will be distributed across the node or nodes + specified in the mempolicy as if "interleave" had been specified. + However, if a node in the policy does not contain sufficient contiguous + memory for a huge page, the allocation will not "fallback" to the nearest + neighbor node with sufficient contiguous memory. To do this would cause + undesirable imbalance in the distribution of the huge page pool, or + possibly, allocation of persistent huge pages on nodes not allowed by + the task's memory policy. + +#. One or more nodes may be specified with the bind or interleave policy. + If more than one node is specified with the preferred policy, only the + lowest numeric id will be used. Local policy will select the node where + the task is running at the time the nodes_allowed mask is constructed. + For local policy to be deterministic, the task must be bound to a cpu or + cpus in a single node. Otherwise, the task could be migrated to some + other node at any time after launch and the resulting node will be + indeterminate. Thus, local policy is not very useful for this purpose. + Any of the other mempolicy modes may be used to specify a single node. + +#. The nodes allowed mask will be derived from any non-default task mempolicy, + whether this policy was set explicitly by the task itself or one of its + ancestors, such as numactl. This means that if the task is invoked from a + shell with non-default policy, that policy will be used. One can specify a + node list of "all" with numactl --interleave or --membind [-m] to achieve + interleaving over all nodes in the system or cpuset. + +#. Any task mempolicy specified--e.g., using numactl--will be constrained by + the resource limits of any cpuset in which the task runs. Thus, there will + be no way for a task with non-default policy running in a cpuset with a + subset of the system nodes to allocate huge pages outside the cpuset + without first moving to a cpuset that contains all of the desired nodes. + +#. Boot-time huge page allocation attempts to distribute the requested number + of huge pages over all on-lines nodes with memory. + +Per Node Hugepages Attributes +============================= + +A subset of the contents of the root huge page control directory in sysfs, +described above, will be replicated under each the system device of each +NUMA node with memory in:: + + /sys/devices/system/node/node[0-9]*/hugepages/ + +Under this directory, the subdirectory for each supported huge page size +contains the following attribute files:: + + nr_hugepages + free_hugepages + surplus_hugepages + +The free\_' and surplus\_' attribute files are read-only. They return the number +of free and surplus [overcommitted] huge pages, respectively, on the parent +node. + +The ``nr_hugepages`` attribute returns the total number of huge pages on the +specified node. When this attribute is written, the number of persistent huge +pages on the parent node will be adjusted to the specified value, if sufficient +resources exist, regardless of the task's mempolicy or cpuset constraints. + +Note that the number of overcommit and reserve pages remain global quantities, +as we don't know until fault time, when the faulting task's mempolicy is +applied, from which node the huge page allocation will be attempted. + +.. _using_huge_pages: + +Using Huge Pages +================ + +If the user applications are going to request huge pages using mmap system +call, then it is required that system administrator mount a file system of +type hugetlbfs:: + + mount -t hugetlbfs \ + -o uid=,gid=,mode=,pagesize=,size=,\ + min_size=,nr_inodes= none /mnt/huge + +This command mounts a (pseudo) filesystem of type hugetlbfs on the directory +``/mnt/huge``. Any file created on ``/mnt/huge`` uses huge pages. + +The ``uid`` and ``gid`` options sets the owner and group of the root of the +file system. By default the ``uid`` and ``gid`` of the current process +are taken. + +The ``mode`` option sets the mode of root of file system to value & 01777. +This value is given in octal. By default the value 0755 is picked. + +If the platform supports multiple huge page sizes, the ``pagesize`` option can +be used to specify the huge page size and associated pool. ``pagesize`` +is specified in bytes. If ``pagesize`` is not specified the platform's +default huge page size and associated pool will be used. + +The ``size`` option sets the maximum value of memory (huge pages) allowed +for that filesystem (``/mnt/huge``). The ``size`` option can be specified +in bytes, or as a percentage of the specified huge page pool (``nr_hugepages``). +The size is rounded down to HPAGE_SIZE boundary. + +The ``min_size`` option sets the minimum value of memory (huge pages) allowed +for the filesystem. ``min_size`` can be specified in the same way as ``size``, +either bytes or a percentage of the huge page pool. +At mount time, the number of huge pages specified by ``min_size`` are reserved +for use by the filesystem. +If there are not enough free huge pages available, the mount will fail. +As huge pages are allocated to the filesystem and freed, the reserve count +is adjusted so that the sum of allocated and reserved huge pages is always +at least ``min_size``. + +The option ``nr_inodes`` sets the maximum number of inodes that ``/mnt/huge`` +can use. + +If the ``size``, ``min_size`` or ``nr_inodes`` option is not provided on +command line then no limits are set. + +For ``pagesize``, ``size``, ``min_size`` and ``nr_inodes`` options, you can +use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. +For example, size=2K has the same meaning as size=2048. + +While read system calls are supported on files that reside on hugetlb +file systems, write system calls are not. + +Regular chown, chgrp, and chmod commands (with right permissions) could be +used to change the file attributes on hugetlbfs. + +Also, it is important to note that no such mount command is required if +applications are going to use only shmat/shmget system calls or mmap with +MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see +:ref:`map_hugetlb ` below. + +Users who wish to use hugetlb memory via shared memory segment should be +members of a supplementary group and system admin needs to configure that gid +into ``/proc/sys/vm/hugetlb_shm_group``. It is possible for same or different +applications to use any combination of mmaps and shm* calls, though the mount of +filesystem will be required for using mmap calls without MAP_HUGETLB. + +Syscalls that operate on memory backed by hugetlb pages only have their lengths +aligned to the native page size of the processor; they will normally fail with +errno set to EINVAL or exclude hugetlb pages that extend beyond the length if +not hugepage aligned. For example, munmap(2) will fail if memory is backed by +a hugetlb page and the length is smaller than the hugepage size. + + +Examples +======== + +.. _map_hugetlb: + +``map_hugetlb`` + see tools/testing/selftests/vm/map_hugetlb.c + +``hugepage-shm`` + see tools/testing/selftests/vm/hugepage-shm.c + +``hugepage-mmap`` + see tools/testing/selftests/vm/hugepage-mmap.c + +The `libhugetlbfs`_ library provides a wide range of userspace tools +to help with huge page usability, environment setup, and control. + +.. _libhugetlbfs: https://github.com/libhugetlbfs/libhugetlbfs diff --git a/Documentation/admin-guide/mm/idle_page_tracking.rst b/Documentation/admin-guide/mm/idle_page_tracking.rst new file mode 100644 index 0000000000000000000000000000000000000000..6f7b7ca1add3b915bce58dcafc2e83d7186055cb --- /dev/null +++ b/Documentation/admin-guide/mm/idle_page_tracking.rst @@ -0,0 +1,116 @@ +.. _idle_page_tracking: + +================== +Idle Page Tracking +================== + +Motivation +========== + +The idle page tracking feature allows to track which memory pages are being +accessed by a workload and which are idle. This information can be useful for +estimating the workload's working set size, which, in turn, can be taken into +account when configuring the workload parameters, setting memory cgroup limits, +or deciding where to place the workload within a compute cluster. + +It is enabled by CONFIG_IDLE_PAGE_TRACKING=y. + +.. _user_api: + +User API +======== + +The idle page tracking API is located at ``/sys/kernel/mm/page_idle``. +Currently, it consists of the only read-write file, +``/sys/kernel/mm/page_idle/bitmap``. + +The file implements a bitmap where each bit corresponds to a memory page. The +bitmap is represented by an array of 8-byte integers, and the page at PFN #i is +mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is +set, the corresponding page is idle. + +A page is considered idle if it has not been accessed since it was marked idle +(for more details on what "accessed" actually means see the :ref:`Implementation +Details ` section). +To mark a page idle one has to set the bit corresponding to +the page by writing to the file. A value written to the file is OR-ed with the +current bitmap value. + +Only accesses to user memory pages are tracked. These are pages mapped to a +process address space, page cache and buffer pages, swap cache pages. For other +page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored, +and hence such pages are never reported idle. + +For huge pages the idle flag is set only on the head page, so one has to read +``/proc/kpageflags`` in order to correctly count idle huge pages. + +Reading from or writing to ``/sys/kernel/mm/page_idle/bitmap`` will return +-EINVAL if you are not starting the read/write on an 8-byte boundary, or +if the size of the read/write is not a multiple of 8 bytes. Writing to +this file beyond max PFN will return -ENXIO. + +That said, in order to estimate the amount of pages that are not used by a +workload one should: + + 1. Mark all the workload's pages as idle by setting corresponding bits in + ``/sys/kernel/mm/page_idle/bitmap``. The pages can be found by reading + ``/proc/pid/pagemap`` if the workload is represented by a process, or by + filtering out alien pages using ``/proc/kpagecgroup`` in case the workload + is placed in a memory cgroup. + + 2. Wait until the workload accesses its working set. + + 3. Read ``/sys/kernel/mm/page_idle/bitmap`` and count the number of bits set. + If one wants to ignore certain types of pages, e.g. mlocked pages since they + are not reclaimable, he or she can filter them out using + ``/proc/kpageflags``. + +See :ref:`Documentation/admin-guide/mm/pagemap.rst ` for more +information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and +``/proc/kpagecgroup``. + +.. _impl_details: + +Implementation Details +====================== + +The kernel internally keeps track of accesses to user memory pages in order to +reclaim unreferenced pages first on memory shortage conditions. A page is +considered referenced if it has been recently accessed via a process address +space, in which case one or more PTEs it is mapped to will have the Accessed bit +set, or marked accessed explicitly by the kernel (see mark_page_accessed()). The +latter happens when: + + - a userspace process reads or writes a page using a system call (e.g. read(2) + or write(2)) + + - a page that is used for storing filesystem buffers is read or written, + because a process needs filesystem metadata stored in it (e.g. lists a + directory tree) + + - a page is accessed by a device driver using get_user_pages() + +When a dirty page is written to swap or disk as a result of memory reclaim or +exceeding the dirty memory limit, it is not marked referenced. + +The idle memory tracking feature adds a new page flag, the Idle flag. This flag +is set manually, by writing to ``/sys/kernel/mm/page_idle/bitmap`` (see the +:ref:`User API ` +section), and cleared automatically whenever a page is referenced as defined +above. + +When a page is marked idle, the Accessed bit must be cleared in all PTEs it is +mapped to, otherwise we will not be able to detect accesses to the page coming +from a process address space. To avoid interference with the reclaimer, which, +as noted above, uses the Accessed bit to promote actively referenced pages, one +more page flag is introduced, the Young flag. When the PTE Accessed bit is +cleared as a result of setting or updating a page's Idle flag, the Young flag +is set on the page. The reclaimer treats the Young flag as an extra PTE +Accessed bit and therefore will consider such a page as referenced. + +Since the idle memory tracking feature is based on the memory reclaimer logic, +it only works with pages that are on an LRU list, other pages are silently +ignored. That means it will ignore a user memory page if it is isolated, but +since there are usually not many of them, it should not affect the overall +result noticeably. In order not to stall scanning of the idle page bitmap, +locked pages may be skipped too. diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst new file mode 100644 index 0000000000000000000000000000000000000000..ceead68c2df744818021c8dc40b4efc7d4065bb9 --- /dev/null +++ b/Documentation/admin-guide/mm/index.rst @@ -0,0 +1,36 @@ +================= +Memory Management +================= + +Linux memory management subsystem is responsible, as the name implies, +for managing the memory in the system. This includes implemnetation of +virtual memory and demand paging, memory allocation both for kernel +internal structures and user space programms, mapping of files into +processes address space and many other cool things. + +Linux memory management is a complex system with many configurable +settings. Most of these settings are available via ``/proc`` +filesystem and can be quired and adjusted using ``sysctl``. These APIs +are described in Documentation/sysctl/vm.txt and in `man 5 proc`_. + +.. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html + +Linux memory management has its own jargon and if you are not yet +familiar with it, consider reading +:ref:`Documentation/admin-guide/mm/concepts.rst `. + +Here we document in detail how to interact with various mechanisms in +the Linux memory management. + +.. toctree:: + :maxdepth: 1 + + concepts + hugetlbpage + idle_page_tracking + ksm + numa_memory_policy + pagemap + soft-dirty + transhuge + userfaultfd diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst new file mode 100644 index 0000000000000000000000000000000000000000..9303786632d145ff5e88a26bf01662a454f7572e --- /dev/null +++ b/Documentation/admin-guide/mm/ksm.rst @@ -0,0 +1,189 @@ +.. _admin_guide_ksm: + +======================= +Kernel Samepage Merging +======================= + +Overview +======== + +KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y, +added to the Linux kernel in 2.6.32. See ``mm/ksm.c`` for its implementation, +and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/ + +KSM was originally developed for use with KVM (where it was known as +Kernel Shared Memory), to fit more virtual machines into physical memory, +by sharing the data common between them. But it can be useful to any +application which generates many instances of the same data. + +The KSM daemon ksmd periodically scans those areas of user memory +which have been registered with it, looking for pages of identical +content which can be replaced by a single write-protected page (which +is automatically copied if a process later wants to update its +content). The amount of pages that KSM daemon scans in a single pass +and the time between the passes are configured using :ref:`sysfs +intraface ` + +KSM only merges anonymous (private) pages, never pagecache (file) pages. +KSM's merged pages were originally locked into kernel memory, but can now +be swapped out just like other user pages (but sharing is broken when they +are swapped back in: ksmd must rediscover their identity and merge again). + +Controlling KSM with madvise +============================ + +KSM only operates on those areas of address space which an application +has advised to be likely candidates for merging, by using the madvise(2) +system call:: + + int madvise(addr, length, MADV_MERGEABLE) + +The app may call + +:: + + int madvise(addr, length, MADV_UNMERGEABLE) + +to cancel that advice and restore unshared pages: whereupon KSM +unmerges whatever it merged in that range. Note: this unmerging call +may suddenly require more memory than is available - possibly failing +with EAGAIN, but more probably arousing the Out-Of-Memory killer. + +If KSM is not configured into the running kernel, madvise MADV_MERGEABLE +and MADV_UNMERGEABLE simply fail with EINVAL. If the running kernel was +built with CONFIG_KSM=y, those calls will normally succeed: even if the +the KSM daemon is not currently running, MADV_MERGEABLE still registers +the range for whenever the KSM daemon is started; even if the range +cannot contain any pages which KSM could actually merge; even if +MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE. + +If a region of memory must be split into at least one new MADV_MERGEABLE +or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process +will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.txt). + +Like other madvise calls, they are intended for use on mapped areas of +the user address space: they will report ENOMEM if the specified range +includes unmapped gaps (though working on the intervening mapped areas), +and might fail with EAGAIN if not enough memory for internal structures. + +Applications should be considerate in their use of MADV_MERGEABLE, +restricting its use to areas likely to benefit. KSM's scans may use a lot +of processing power: some installations will disable KSM for that reason. + +.. _ksm_sysfs: + +KSM daemon sysfs interface +========================== + +The KSM daemon is controlled by sysfs files in ``/sys/kernel/mm/ksm/``, +readable by all but writable only by root: + +pages_to_scan + how many pages to scan before ksmd goes to sleep + e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``. + + Default: 100 (chosen for demonstration purposes) + +sleep_millisecs + how many milliseconds ksmd should sleep before next scan + e.g. ``echo 20 > /sys/kernel/mm/ksm/sleep_millisecs`` + + Default: 20 (chosen for demonstration purposes) + +merge_across_nodes + specifies if pages from different NUMA nodes can be merged. + When set to 0, ksm merges only pages which physically reside + in the memory area of same NUMA node. That brings lower + latency to access of shared pages. Systems with more nodes, at + significant NUMA distances, are likely to benefit from the + lower latency of setting 0. Smaller systems, which need to + minimize memory usage, are likely to benefit from the greater + sharing of setting 1 (default). You may wish to compare how + your system performs under each setting, before deciding on + which to use. ``merge_across_nodes`` setting can be changed only + when there are no ksm shared pages in the system: set run 2 to + unmerge pages first, then to 1 after changing + ``merge_across_nodes``, to remerge according to the new setting. + + Default: 1 (merging across nodes as in earlier releases) + +run + * set to 0 to stop ksmd from running but keep merged pages, + * set to 1 to run ksmd e.g. ``echo 1 > /sys/kernel/mm/ksm/run``, + * set to 2 to stop ksmd and unmerge all pages currently merged, but + leave mergeable areas registered for next run. + + Default: 0 (must be changed to 1 to activate KSM, except if + CONFIG_SYSFS is disabled) + +use_zero_pages + specifies whether empty pages (i.e. allocated pages that only + contain zeroes) should be treated specially. When set to 1, + empty pages are merged with the kernel zero page(s) instead of + with each other as it would happen normally. This can improve + the performance on architectures with coloured zero pages, + depending on the workload. Care should be taken when enabling + this setting, as it can potentially degrade the performance of + KSM for some workloads, for example if the checksums of pages + candidate for merging match the checksum of an empty + page. This setting can be changed at any time, it is only + effective for pages merged after the change. + + Default: 0 (normal KSM behaviour as in earlier releases) + +max_page_sharing + Maximum sharing allowed for each KSM page. This enforces a + deduplication limit to avoid high latency for virtual memory + operations that involve traversal of the virtual mappings that + share the KSM page. The minimum value is 2 as a newly created + KSM page will have at least two sharers. The higher this value + the faster KSM will merge the memory and the higher the + deduplication factor will be, but the slower the worst case + virtual mappings traversal could be for any given KSM + page. Slowing down this traversal means there will be higher + latency for certain virtual memory operations happening during + swapping, compaction, NUMA balancing and page migration, in + turn decreasing responsiveness for the caller of those virtual + memory operations. The scheduler latency of other tasks not + involved with the VM operations doing the virtual mappings + traversal is not affected by this parameter as these + traversals are always schedule friendly themselves. + +stable_node_chains_prune_millisecs + specifies how frequently KSM checks the metadata of the pages + that hit the deduplication limit for stale information. + Smaller milllisecs values will free up the KSM metadata with + lower latency, but they will make ksmd use more CPU during the + scan. It's a noop if not a single KSM page hit the + ``max_page_sharing`` yet. + +The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``: + +pages_shared + how many shared pages are being used +pages_sharing + how many more sites are sharing them i.e. how much saved +pages_unshared + how many pages unique but repeatedly checked for merging +pages_volatile + how many pages changing too fast to be placed in a tree +full_scans + how many times all mergeable areas have been scanned +stable_node_chains + the number of KSM pages that hit the ``max_page_sharing`` limit +stable_node_dups + number of duplicated KSM pages + +A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good +sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing`` +indicates wasted effort. ``pages_volatile`` embraces several +different kinds of activity, but a high proportion there would also +indicate poor use of madvise MADV_MERGEABLE. + +The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the +``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must +be increased accordingly. + +-- +Izik Eidus, +Hugh Dickins, 17 Nov 2009 diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst new file mode 100644 index 0000000000000000000000000000000000000000..d78c5b315f72695618d7b8519cd6797d008f6ed6 --- /dev/null +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -0,0 +1,495 @@ +.. _numa_memory_policy: + +================== +NUMA Memory Policy +================== + +What is NUMA Memory Policy? +============================ + +In the Linux kernel, "memory policy" determines from which node the kernel will +allocate memory in a NUMA system or in an emulated NUMA system. Linux has +supported platforms with Non-Uniform Memory Access architectures since 2.4.?. +The current memory policy support was added to Linux 2.6 around May 2004. This +document attempts to describe the concepts and APIs of the 2.6 memory policy +support. + +Memory policies should not be confused with cpusets +(``Documentation/cgroup-v1/cpusets.txt``) +which is an administrative mechanism for restricting the nodes from which +memory may be allocated by a set of processes. Memory policies are a +programming interface that a NUMA-aware application can take advantage of. When +both cpusets and policies are applied to a task, the restrictions of the cpuset +takes priority. See :ref:`Memory Policies and cpusets ` +below for more details. + +Memory Policy Concepts +====================== + +Scope of Memory Policies +------------------------ + +The Linux kernel supports _scopes_ of memory policy, described here from +most general to most specific: + +System Default Policy + this policy is "hard coded" into the kernel. It is the policy + that governs all page allocations that aren't controlled by + one of the more specific policy scopes discussed below. When + the system is "up and running", the system default policy will + use "local allocation" described below. However, during boot + up, the system default policy will be set to interleave + allocations across all nodes with "sufficient" memory, so as + not to overload the initial boot node with boot-time + allocations. + +Task/Process Policy + this is an optional, per-task policy. When defined for a + specific task, this policy controls all page allocations made + by or on behalf of the task that aren't controlled by a more + specific scope. If a task does not define a task policy, then + all page allocations that would have been controlled by the + task policy "fall back" to the System Default Policy. + + The task policy applies to the entire address space of a task. Thus, + it is inheritable, and indeed is inherited, across both fork() + [clone() w/o the CLONE_VM flag] and exec*(). This allows a parent task + to establish the task policy for a child task exec()'d from an + executable image that has no awareness of memory policy. See the + :ref:`Memory Policy APIs ` section, + below, for an overview of the system call + that a task may use to set/change its task/process policy. + + In a multi-threaded task, task policies apply only to the thread + [Linux kernel task] that installs the policy and any threads + subsequently created by that thread. Any sibling threads existing + at the time a new task policy is installed retain their current + policy. + + A task policy applies only to pages allocated after the policy is + installed. Any pages already faulted in by the task when the task + changes its task policy remain where they were allocated based on + the policy at the time they were allocated. + +.. _vma_policy: + +VMA Policy + A "VMA" or "Virtual Memory Area" refers to a range of a task's + virtual address space. A task may define a specific policy for a range + of its virtual address space. See the + :ref:`Memory Policy APIs ` section, + below, for an overview of the mbind() system call used to set a VMA + policy. + + A VMA policy will govern the allocation of pages that back + this region of the address space. Any regions of the task's + address space that don't have an explicit VMA policy will fall + back to the task policy, which may itself fall back to the + System Default Policy. + + VMA policies have a few complicating details: + + * VMA policy applies ONLY to anonymous pages. These include + pages allocated for anonymous segments, such as the task + stack and heap, and any regions of the address space + mmap()ed with the MAP_ANONYMOUS flag. If a VMA policy is + applied to a file mapping, it will be ignored if the mapping + used the MAP_SHARED flag. If the file mapping used the + MAP_PRIVATE flag, the VMA policy will only be applied when + an anonymous page is allocated on an attempt to write to the + mapping-- i.e., at Copy-On-Write. + + * VMA policies are shared between all tasks that share a + virtual address space--a.k.a. threads--independent of when + the policy is installed; and they are inherited across + fork(). However, because VMA policies refer to a specific + region of a task's address space, and because the address + space is discarded and recreated on exec*(), VMA policies + are NOT inheritable across exec(). Thus, only NUMA-aware + applications may use VMA policies. + + * A task may install a new VMA policy on a sub-range of a + previously mmap()ed region. When this happens, Linux splits + the existing virtual memory area into 2 or 3 VMAs, each with + it's own policy. + + * By default, VMA policy applies only to pages allocated after + the policy is installed. Any pages already faulted into the + VMA range remain where they were allocated based on the + policy at the time they were allocated. However, since + 2.6.16, Linux supports page migration via the mbind() system + call, so that page contents can be moved to match a newly + installed policy. + +Shared Policy + Conceptually, shared policies apply to "memory objects" mapped + shared into one or more tasks' distinct address spaces. An + application installs shared policies the same way as VMA + policies--using the mbind() system call specifying a range of + virtual addresses that map the shared object. However, unlike + VMA policies, which can be considered to be an attribute of a + range of a task's address space, shared policies apply + directly to the shared object. Thus, all tasks that attach to + the object share the policy, and all pages allocated for the + shared object, by any task, will obey the shared policy. + + As of 2.6.22, only shared memory segments, created by shmget() or + mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy. When shared + policy support was added to Linux, the associated data structures were + added to hugetlbfs shmem segments. At the time, hugetlbfs did not + support allocation at fault time--a.k.a lazy allocation--so hugetlbfs + shmem segments were never "hooked up" to the shared policy support. + Although hugetlbfs segments now support lazy allocation, their support + for shared policy has not been completed. + + As mentioned above in :ref:`VMA policies ` section, + allocations of page cache pages for regular files mmap()ed + with MAP_SHARED ignore any VMA policy installed on the virtual + address range backed by the shared file mapping. Rather, + shared page cache pages, including pages backing private + mappings that have not yet been written by the task, follow + task policy, if any, else System Default Policy. + + The shared policy infrastructure supports different policies on subset + ranges of the shared object. However, Linux still splits the VMA of + the task that installs the policy for each range of distinct policy. + Thus, different tasks that attach to a shared memory segment can have + different VMA configurations mapping that one shared object. This + can be seen by examining the /proc//numa_maps of tasks sharing + a shared memory region, when one task has installed shared policy on + one or more ranges of the region. + +Components of Memory Policies +----------------------------- + +A NUMA memory policy consists of a "mode", optional mode flags, and +an optional set of nodes. The mode determines the behavior of the +policy, the optional mode flags determine the behavior of the mode, +and the optional set of nodes can be viewed as the arguments to the +policy behavior. + +Internally, memory policies are implemented by a reference counted +structure, struct mempolicy. Details of this structure will be +discussed in context, below, as required to explain the behavior. + +NUMA memory policy supports the following 4 behavioral modes: + +Default Mode--MPOL_DEFAULT + This mode is only used in the memory policy APIs. Internally, + MPOL_DEFAULT is converted to the NULL memory policy in all + policy scopes. Any existing non-default policy will simply be + removed when MPOL_DEFAULT is specified. As a result, + MPOL_DEFAULT means "fall back to the next most specific policy + scope." + + For example, a NULL or default task policy will fall back to the + system default policy. A NULL or default vma policy will fall + back to the task policy. + + When specified in one of the memory policy APIs, the Default mode + does not use the optional set of nodes. + + It is an error for the set of nodes specified for this policy to + be non-empty. + +MPOL_BIND + This mode specifies that memory must come from the set of + nodes specified by the policy. Memory will be allocated from + the node in the set with sufficient free memory that is + closest to the node where the allocation takes place. + +MPOL_PREFERRED + This mode specifies that the allocation should be attempted + from the single node specified in the policy. If that + allocation fails, the kernel will search other nodes, in order + of increasing distance from the preferred node based on + information provided by the platform firmware. + + Internally, the Preferred policy uses a single node--the + preferred_node member of struct mempolicy. When the internal + mode flag MPOL_F_LOCAL is set, the preferred_node is ignored + and the policy is interpreted as local allocation. "Local" + allocation policy can be viewed as a Preferred policy that + starts at the node containing the cpu where the allocation + takes place. + + It is possible for the user to specify that local allocation + is always preferred by passing an empty nodemask with this + mode. If an empty nodemask is passed, the policy cannot use + the MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags + described below. + +MPOL_INTERLEAVED + This mode specifies that page allocations be interleaved, on a + page granularity, across the nodes specified in the policy. + This mode also behaves slightly differently, based on the + context where it is used: + + For allocation of anonymous pages and shared memory pages, + Interleave mode indexes the set of nodes specified by the + policy using the page offset of the faulting address into the + segment [VMA] containing the address modulo the number of + nodes specified by the policy. It then attempts to allocate a + page, starting at the selected node, as if the node had been + specified by a Preferred policy or had been selected by a + local allocation. That is, allocation will follow the per + node zonelist. + + For allocation of page cache pages, Interleave mode indexes + the set of nodes specified by the policy using a node counter + maintained per task. This counter wraps around to the lowest + specified node after it reaches the highest specified node. + This will tend to spread the pages out over the nodes + specified by the policy based on the order in which they are + allocated, rather than based on any page offset into an + address range or file. During system boot up, the temporary + interleaved system default policy works in this mode. + +NUMA memory policy supports the following optional mode flags: + +MPOL_F_STATIC_NODES + This flag specifies that the nodemask passed by + the user should not be remapped if the task or VMA's set of allowed + nodes changes after the memory policy has been defined. + + Without this flag, any time a mempolicy is rebound because of a + change in the set of allowed nodes, the node (Preferred) or + nodemask (Bind, Interleave) is remapped to the new set of + allowed nodes. This may result in nodes being used that were + previously undesired. + + With this flag, if the user-specified nodes overlap with the + nodes allowed by the task's cpuset, then the memory policy is + applied to their intersection. If the two sets of nodes do not + overlap, the Default policy is used. + + For example, consider a task that is attached to a cpuset with + mems 1-3 that sets an Interleave policy over the same set. If + the cpuset's mems change to 3-5, the Interleave will now occur + over nodes 3, 4, and 5. With this flag, however, since only node + 3 is allowed from the user's nodemask, the "interleave" only + occurs over that node. If no nodes from the user's nodemask are + now allowed, the Default behavior is used. + + MPOL_F_STATIC_NODES cannot be combined with the + MPOL_F_RELATIVE_NODES flag. It also cannot be used for + MPOL_PREFERRED policies that were created with an empty nodemask + (local allocation). + +MPOL_F_RELATIVE_NODES + This flag specifies that the nodemask passed + by the user will be mapped relative to the set of the task or VMA's + set of allowed nodes. The kernel stores the user-passed nodemask, + and if the allowed nodes changes, then that original nodemask will + be remapped relative to the new set of allowed nodes. + + Without this flag (and without MPOL_F_STATIC_NODES), anytime a + mempolicy is rebound because of a change in the set of allowed + nodes, the node (Preferred) or nodemask (Bind, Interleave) is + remapped to the new set of allowed nodes. That remap may not + preserve the relative nature of the user's passed nodemask to its + set of allowed nodes upon successive rebinds: a nodemask of + 1,3,5 may be remapped to 7-9 and then to 1-3 if the set of + allowed nodes is restored to its original state. + + With this flag, the remap is done so that the node numbers from + the user's passed nodemask are relative to the set of allowed + nodes. In other words, if nodes 0, 2, and 4 are set in the user's + nodemask, the policy will be effected over the first (and in the + Bind or Interleave case, the third and fifth) nodes in the set of + allowed nodes. The nodemask passed by the user represents nodes + relative to task or VMA's set of allowed nodes. + + If the user's nodemask includes nodes that are outside the range + of the new set of allowed nodes (for example, node 5 is set in + the user's nodemask when the set of allowed nodes is only 0-3), + then the remap wraps around to the beginning of the nodemask and, + if not already set, sets the node in the mempolicy nodemask. + + For example, consider a task that is attached to a cpuset with + mems 2-5 that sets an Interleave policy over the same set with + MPOL_F_RELATIVE_NODES. If the cpuset's mems change to 3-7, the + interleave now occurs over nodes 3,5-7. If the cpuset's mems + then change to 0,2-3,5, then the interleave occurs over nodes + 0,2-3,5. + + Thanks to the consistent remapping, applications preparing + nodemasks to specify memory policies using this flag should + disregard their current, actual cpuset imposed memory placement + and prepare the nodemask as if they were always located on + memory nodes 0 to N-1, where N is the number of memory nodes the + policy is intended to manage. Let the kernel then remap to the + set of memory nodes allowed by the task's cpuset, as that may + change over time. + + MPOL_F_RELATIVE_NODES cannot be combined with the + MPOL_F_STATIC_NODES flag. It also cannot be used for + MPOL_PREFERRED policies that were created with an empty nodemask + (local allocation). + +Memory Policy Reference Counting +================================ + +To resolve use/free races, struct mempolicy contains an atomic reference +count field. Internal interfaces, mpol_get()/mpol_put() increment and +decrement this reference count, respectively. mpol_put() will only free +the structure back to the mempolicy kmem cache when the reference count +goes to zero. + +When a new memory policy is allocated, its reference count is initialized +to '1', representing the reference held by the task that is installing the +new policy. When a pointer to a memory policy structure is stored in another +structure, another reference is added, as the task's reference will be dropped +on completion of the policy installation. + +During run-time "usage" of the policy, we attempt to minimize atomic operations +on the reference count, as this can lead to cache lines bouncing between cpus +and NUMA nodes. "Usage" here means one of the following: + +1) querying of the policy, either by the task itself [using the get_mempolicy() + API discussed below] or by another task using the /proc//numa_maps + interface. + +2) examination of the policy to determine the policy mode and associated node + or node lists, if any, for page allocation. This is considered a "hot + path". Note that for MPOL_BIND, the "usage" extends across the entire + allocation process, which may sleep during page reclaimation, because the + BIND policy nodemask is used, by reference, to filter ineligible nodes. + +We can avoid taking an extra reference during the usages listed above as +follows: + +1) we never need to get/free the system default policy as this is never + changed nor freed, once the system is up and running. + +2) for querying the policy, we do not need to take an extra reference on the + target task's task policy nor vma policies because we always acquire the + task's mm's mmap_sem for read during the query. The set_mempolicy() and + mbind() APIs [see below] always acquire the mmap_sem for write when + installing or replacing task or vma policies. Thus, there is no possibility + of a task or thread freeing a policy while another task or thread is + querying it. + +3) Page allocation usage of task or vma policy occurs in the fault path where + we hold them mmap_sem for read. Again, because replacing the task or vma + policy requires that the mmap_sem be held for write, the policy can't be + freed out from under us while we're using it for page allocation. + +4) Shared policies require special consideration. One task can replace a + shared memory policy while another task, with a distinct mmap_sem, is + querying or allocating a page based on the policy. To resolve this + potential race, the shared policy infrastructure adds an extra reference + to the shared policy during lookup while holding a spin lock on the shared + policy management structure. This requires that we drop this extra + reference when we're finished "using" the policy. We must drop the + extra reference on shared policies in the same query/allocation paths + used for non-shared policies. For this reason, shared policies are marked + as such, and the extra reference is dropped "conditionally"--i.e., only + for shared policies. + + Because of this extra reference counting, and because we must lookup + shared policies in a tree structure under spinlock, shared policies are + more expensive to use in the page allocation path. This is especially + true for shared policies on shared memory regions shared by tasks running + on different NUMA nodes. This extra overhead can be avoided by always + falling back to task or system default policy for shared memory regions, + or by prefaulting the entire shared memory region into memory and locking + it down. However, this might not be appropriate for all applications. + +.. _memory_policy_apis: + +Memory Policy APIs +================== + +Linux supports 3 system calls for controlling memory policy. These APIS +always affect only the calling task, the calling task's address space, or +some shared object mapped into the calling task's address space. + +.. note:: + the headers that define these APIs and the parameter data types for + user space applications reside in a package that is not part of the + Linux kernel. The kernel system call interfaces, with the 'sys\_' + prefix, are defined in ; the mode and flag + definitions are defined in . + +Set [Task] Memory Policy:: + + long set_mempolicy(int mode, const unsigned long *nmask, + unsigned long maxnode); + +Set's the calling task's "task/process memory policy" to mode +specified by the 'mode' argument and the set of nodes defined by +'nmask'. 'nmask' points to a bit mask of node ids containing at least +'maxnode' ids. Optional mode flags may be passed by combining the +'mode' argument with the flag (for example: MPOL_INTERLEAVE | +MPOL_F_STATIC_NODES). + +See the set_mempolicy(2) man page for more details + + +Get [Task] Memory Policy or Related Information:: + + long get_mempolicy(int *mode, + const unsigned long *nmask, unsigned long maxnode, + void *addr, int flags); + +Queries the "task/process memory policy" of the calling task, or the +policy or location of a specified virtual address, depending on the +'flags' argument. + +See the get_mempolicy(2) man page for more details + + +Install VMA/Shared Policy for a Range of Task's Address Space:: + + long mbind(void *start, unsigned long len, int mode, + const unsigned long *nmask, unsigned long maxnode, + unsigned flags); + +mbind() installs the policy specified by (mode, nmask, maxnodes) as a +VMA policy for the range of the calling task's address space specified +by the 'start' and 'len' arguments. Additional actions may be +requested via the 'flags' argument. + +See the mbind(2) man page for more details. + +Memory Policy Command Line Interface +==================================== + +Although not strictly part of the Linux implementation of memory policy, +a command line tool, numactl(8), exists that allows one to: + ++ set the task policy for a specified program via set_mempolicy(2), fork(2) and + exec(2) + ++ set the shared policy for a shared memory segment via mbind(2) + +The numactl(8) tool is packaged with the run-time version of the library +containing the memory policy system call wrappers. Some distributions +package the headers and compile-time libraries in a separate development +package. + +.. _mem_pol_and_cpusets: + +Memory Policies and cpusets +=========================== + +Memory policies work within cpusets as described above. For memory policies +that require a node or set of nodes, the nodes are restricted to the set of +nodes whose memories are allowed by the cpuset constraints. If the nodemask +specified for the policy contains nodes that are not allowed by the cpuset and +MPOL_F_RELATIVE_NODES is not used, the intersection of the set of nodes +specified for the policy and the set of nodes with memory is used. If the +result is the empty set, the policy is considered invalid and cannot be +installed. If MPOL_F_RELATIVE_NODES is used, the policy's nodes are mapped +onto and folded into the task's set of allowed nodes as previously described. + +The interaction of memory policies and cpusets can be problematic when tasks +in two cpusets share access to a memory region, such as shared memory segments +created by shmget() of mmap() with the MAP_ANONYMOUS and MAP_SHARED flags, and +any of the tasks install shared policy on the region, only nodes whose +memories are allowed in both cpusets may be used in the policies. Obtaining +this information requires "stepping outside" the memory policy APIs to use the +cpuset information and requires that one know in what cpusets other task might +be attaching to the shared region. Furthermore, if the cpusets' allowed +memory sets are disjoint, "local" allocation is the only valid policy. diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst new file mode 100644 index 0000000000000000000000000000000000000000..577af85beb410014690bfb51bd10b372b272d995 --- /dev/null +++ b/Documentation/admin-guide/mm/pagemap.rst @@ -0,0 +1,201 @@ +.. _pagemap: + +============================= +Examining Process Page Tables +============================= + +pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow +userspace programs to examine the page tables and related information by +reading files in ``/proc``. + +There are four components to pagemap: + + * ``/proc/pid/pagemap``. This file lets a userspace process find out which + physical frame each virtual page is mapped to. It contains one 64-bit + value for each virtual page, containing the following data (from + ``fs/proc/task_mmu.c``, above pagemap_read): + + * Bits 0-54 page frame number (PFN) if present + * Bits 0-4 swap type if swapped + * Bits 5-54 swap offset if swapped + * Bit 55 pte is soft-dirty (see + :ref:`Documentation/admin-guide/mm/soft-dirty.rst `) + * Bit 56 page exclusively mapped (since 4.2) + * Bits 57-60 zero + * Bit 61 page is file-page or shared-anon (since 3.5) + * Bit 62 page swapped + * Bit 63 page present + + Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs. + In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from + 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN. + Reason: information about PFNs helps in exploiting Rowhammer vulnerability. + + If the page is not present but in swap, then the PFN contains an + encoding of the swap file number and the page's offset into the + swap. Unmapped pages return a null PFN. This allows determining + precisely which pages are mapped (or in swap) and comparing mapped + pages between processes. + + Efficient users of this interface will use ``/proc/pid/maps`` to + determine which areas of memory are actually mapped and llseek to + skip over unmapped regions. + + * ``/proc/kpagecount``. This file contains a 64-bit count of the number of + times each page is mapped, indexed by PFN. + + * ``/proc/kpageflags``. This file contains a 64-bit set of flags for each + page, indexed by PFN. + + The flags are (from ``fs/proc/page.c``, above kpageflags_read): + + 0. LOCKED + 1. ERROR + 2. REFERENCED + 3. UPTODATE + 4. DIRTY + 5. LRU + 6. ACTIVE + 7. SLAB + 8. WRITEBACK + 9. RECLAIM + 10. BUDDY + 11. MMAP + 12. ANON + 13. SWAPCACHE + 14. SWAPBACKED + 15. COMPOUND_HEAD + 16. COMPOUND_TAIL + 17. HUGE + 18. UNEVICTABLE + 19. HWPOISON + 20. NOPAGE + 21. KSM + 22. THP + 23. BALLOON + 24. ZERO_PAGE + 25. IDLE + + * ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the + memory cgroup each page is charged to, indexed by PFN. Only available when + CONFIG_MEMCG is set. + +Short descriptions to the page flags +==================================== + +0 - LOCKED + page is being locked for exclusive access, e.g. by undergoing read/write IO +7 - SLAB + page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator + When compound page is used, SLUB/SLQB will only set this flag on the head + page; SLOB will not flag it at all. +10 - BUDDY + a free memory block managed by the buddy system allocator + The buddy system organizes free memory in blocks of various orders. + An order N block has 2^N physically contiguous pages, with the BUDDY flag + set for and _only_ for the first page. +15 - COMPOUND_HEAD + A compound page with order N consists of 2^N physically contiguous pages. + A compound page with order 2 takes the form of "HTTT", where H donates its + head page and T donates its tail page(s). The major consumers of compound + pages are hugeTLB pages + (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst `), + the SLUB etc. memory allocators and various device drivers. + However in this interface, only huge/giga pages are made visible + to end users. +16 - COMPOUND_TAIL + A compound page tail (see description above). +17 - HUGE + this is an integral part of a HugeTLB page +19 - HWPOISON + hardware detected memory corruption on this page: don't touch the data! +20 - NOPAGE + no page frame exists at the requested address +21 - KSM + identical memory pages dynamically shared between one or more processes +22 - THP + contiguous pages which construct transparent hugepages +23 - BALLOON + balloon compaction page +24 - ZERO_PAGE + zero page for pfn_zero or huge_zero page +25 - IDLE + page has not been accessed since it was marked idle (see + :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst `). + Note that this flag may be stale in case the page was accessed via + a PTE. To make sure the flag is up-to-date one has to read + ``/sys/kernel/mm/page_idle/bitmap`` first. + +IO related page flags +--------------------- + +1 - ERROR + IO error occurred +3 - UPTODATE + page has up-to-date data + ie. for file backed page: (in-memory data revision >= on-disk one) +4 - DIRTY + page has been written to, hence contains new data + i.e. for file backed page: (in-memory data revision > on-disk one) +8 - WRITEBACK + page is being synced to disk + +LRU related page flags +---------------------- + +5 - LRU + page is in one of the LRU lists +6 - ACTIVE + page is in the active LRU list +18 - UNEVICTABLE + page is in the unevictable (non-)LRU list It is somehow pinned and + not a candidate for LRU page reclaims, e.g. ramfs pages, + shmctl(SHM_LOCK) and mlock() memory segments +2 - REFERENCED + page has been referenced since last LRU list enqueue/requeue +9 - RECLAIM + page will be reclaimed soon after its pageout IO completed +11 - MMAP + a memory mapped page +12 - ANON + a memory mapped page that is not part of a file +13 - SWAPCACHE + page is mapped to swap space, i.e. has an associated swap entry +14 - SWAPBACKED + page is backed by swap/RAM + +The page-types tool in the tools/vm directory can be used to query the +above flags. + +Using pagemap to do something useful +==================================== + +The general procedure for using pagemap to find out about a process' memory +usage goes like this: + + 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are + mapped to what. + 2. Select the maps you are interested in -- all of them, or a particular + library, or the stack or the heap, etc. + 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine. + 4. Read a u64 for each page from pagemap. + 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you + just read, seek to that entry in the file, and read the data you want. + +For example, to find the "unique set size" (USS), which is the amount of +memory that a process is using that is not shared with any other process, +you can go through every map in the process, find the PFNs, look those up +in kpagecount, and tally up the number of pages that are only referenced +once. + +Other notes +=========== + +Reading from any of the files will return -EINVAL if you are not starting +the read on an 8-byte boundary (e.g., if you sought an odd number of bytes +into the file), or if the size of the read is not a multiple of 8 bytes. + +Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is +always 12 at most architectures). Since Linux 3.11 their meaning changes +after first clear of soft-dirty bits. Since Linux 4.2 they are used for +flags unconditionally. diff --git a/Documentation/admin-guide/mm/soft-dirty.rst b/Documentation/admin-guide/mm/soft-dirty.rst new file mode 100644 index 0000000000000000000000000000000000000000..cb0cfd6672fa79186b5818425e605cf03e904e0e --- /dev/null +++ b/Documentation/admin-guide/mm/soft-dirty.rst @@ -0,0 +1,47 @@ +.. _soft_dirty: + +=============== +Soft-Dirty PTEs +=============== + +The soft-dirty is a bit on a PTE which helps to track which pages a task +writes to. In order to do this tracking one should + + 1. Clear soft-dirty bits from the task's PTEs. + + This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the + task in question. + + 2. Wait some time. + + 3. Read soft-dirty bits from the PTEs. + + This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the + 64-bit qword is the soft-dirty one. If set, the respective PTE was + written to since step 1. + + +Internally, to do this tracking, the writable bit is cleared from PTEs +when the soft-dirty bit is cleared. So, after this, when the task tries to +modify a page at some virtual address the #PF occurs and the kernel sets +the soft-dirty bit on the respective PTE. + +Note, that although all the task's address space is marked as r/o after the +soft-dirty bits clear, the #PF-s that occur after that are processed fast. +This is so, since the pages are still mapped to physical memory, and thus all +the kernel does is finds this fact out and puts both writable and soft-dirty +bits on the PTE. + +While in most cases tracking memory changes by #PF-s is more than enough +there is still a scenario when we can lose soft dirty bits -- a task +unmaps a previously mapped memory region and then maps a new one at exactly +the same place. When unmap is called, the kernel internally clears PTE values +including soft dirty bits. To notify user space application about such +memory region renewal the kernel always marks new memory regions (and +expanded regions) as soft dirty. + +This feature is actively used by the checkpoint-restore project. You +can find more details about it on http://criu.org + + +-- Pavel Emelyanov, Apr 9, 2013 diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst new file mode 100644 index 0000000000000000000000000000000000000000..7ab93a8404b9de0c223e74c28713da66b7f49f5a --- /dev/null +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -0,0 +1,418 @@ +.. _admin_guide_transhuge: + +============================ +Transparent Hugepage Support +============================ + +Objective +========= + +Performance critical computing applications dealing with large memory +working sets are already running on top of libhugetlbfs and in turn +hugetlbfs. Transparent HugePage Support (THP) is an alternative mean of +using huge pages for the backing of virtual memory with huge pages +that supports the automatic promotion and demotion of page sizes and +without the shortcomings of hugetlbfs. + +Currently THP only works for anonymous memory mappings and tmpfs/shmem. +But in the future it can expand to other filesystems. + +.. note:: + in the examples below we presume that the basic page size is 4K and + the huge page size is 2M, although the actual numbers may vary + depending on the CPU architecture. + +The reason applications are running faster is because of two +factors. The first factor is almost completely irrelevant and it's not +of significant interest because it'll also have the downside of +requiring larger clear-page copy-page in page faults which is a +potentially negative effect. The first factor consists in taking a +single page fault for each 2M virtual region touched by userland (so +reducing the enter/exit kernel frequency by a 512 times factor). This +only matters the first time the memory is accessed for the lifetime of +a memory mapping. The second long lasting and much more important +factor will affect all subsequent accesses to the memory for the whole +runtime of the application. The second factor consist of two +components: + +1) the TLB miss will run faster (especially with virtualization using + nested pagetables but almost always also on bare metal without + virtualization) + +2) a single TLB entry will be mapping a much larger amount of virtual + memory in turn reducing the number of TLB misses. With + virtualization and nested pagetables the TLB can be mapped of + larger size only if both KVM and the Linux guest are using + hugepages but a significant speedup already happens if only one of + the two is using hugepages just because of the fact the TLB miss is + going to run faster. + +THP can be enabled system wide or restricted to certain tasks or even +memory ranges inside task's address space. Unless THP is completely +disabled, there is ``khugepaged`` daemon that scans memory and +collapses sequences of basic pages into huge pages. + +The THP behaviour is controlled via :ref:`sysfs ` +interface and using madivse(2) and prctl(2) system calls. + +Transparent Hugepage Support maximizes the usefulness of free memory +if compared to the reservation approach of hugetlbfs by allowing all +unused memory to be used as cache or other movable (or even unmovable +entities). It doesn't require reservation to prevent hugepage +allocation failures to be noticeable from userland. It allows paging +and all other advanced VM features to be available on the +hugepages. It requires no modifications for applications to take +advantage of it. + +Applications however can be further optimized to take advantage of +this feature, like for example they've been optimized before to avoid +a flood of mmap system calls for every malloc(4k). Optimizing userland +is by far not mandatory and khugepaged already can take care of long +lived page allocations even for hugepage unaware applications that +deals with large amounts of memory. + +In certain cases when hugepages are enabled system wide, application +may end up allocating more memory resources. An application may mmap a +large region but only touch 1 byte of it, in that case a 2M page might +be allocated instead of a 4k page for no good. This is why it's +possible to disable hugepages system-wide and to only have them inside +MADV_HUGEPAGE madvise regions. + +Embedded systems should enable hugepages only inside madvise regions +to eliminate any risk of wasting any precious byte of memory and to +only run faster. + +Applications that gets a lot of benefit from hugepages and that don't +risk to lose memory by using hugepages, should use +madvise(MADV_HUGEPAGE) on their critical mmapped regions. + +.. _thp_sysfs: + +sysfs +===== + +Global THP controls +------------------- + +Transparent Hugepage Support for anonymous memory can be entirely disabled +(mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE +regions (to avoid the risk of consuming more memory resources) or enabled +system wide. This can be achieved with one of:: + + echo always >/sys/kernel/mm/transparent_hugepage/enabled + echo madvise >/sys/kernel/mm/transparent_hugepage/enabled + echo never >/sys/kernel/mm/transparent_hugepage/enabled + +It's also possible to limit defrag efforts in the VM to generate +anonymous hugepages in case they're not immediately free to madvise +regions or to never try to defrag memory and simply fallback to regular +pages unless hugepages are immediately available. Clearly if we spend CPU +time to defrag memory, we would expect to gain even more by the fact we +use hugepages later instead of regular pages. This isn't always +guaranteed, but it may be more likely in case the allocation is for a +MADV_HUGEPAGE region. + +:: + + echo always >/sys/kernel/mm/transparent_hugepage/defrag + echo defer >/sys/kernel/mm/transparent_hugepage/defrag + echo defer+madvise >/sys/kernel/mm/transparent_hugepage/defrag + echo madvise >/sys/kernel/mm/transparent_hugepage/defrag + echo never >/sys/kernel/mm/transparent_hugepage/defrag + +always + means that an application requesting THP will stall on + allocation failure and directly reclaim pages and compact + memory in an effort to allocate a THP immediately. This may be + desirable for virtual machines that benefit heavily from THP + use and are willing to delay the VM start to utilise them. + +defer + means that an application will wake kswapd in the background + to reclaim pages and wake kcompactd to compact memory so that + THP is available in the near future. It's the responsibility + of khugepaged to then install the THP pages later. + +defer+madvise + will enter direct reclaim and compaction like ``always``, but + only for regions that have used madvise(MADV_HUGEPAGE); all + other regions will wake kswapd in the background to reclaim + pages and wake kcompactd to compact memory so that THP is + available in the near future. + +madvise + will enter direct reclaim like ``always`` but only for regions + that are have used madvise(MADV_HUGEPAGE). This is the default + behaviour. + +never + should be self-explanatory. + +By default kernel tries to use huge zero page on read page fault to +anonymous mapping. It's possible to disable huge zero page by writing 0 +or enable it back by writing 1:: + + echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page + echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page + +Some userspace (such as a test program, or an optimized memory allocation +library) may want to know the size (in bytes) of a transparent hugepage:: + + cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size + +khugepaged will be automatically started when +transparent_hugepage/enabled is set to "always" or "madvise, and it'll +be automatically shutdown if it's set to "never". + +Khugepaged controls +------------------- + +khugepaged runs usually at low frequency so while one may not want to +invoke defrag algorithms synchronously during the page faults, it +should be worth invoking defrag at least in khugepaged. However it's +also possible to disable defrag in khugepaged by writing 0 or enable +defrag in khugepaged by writing 1:: + + echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag + echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag + +You can also control how many pages khugepaged should scan at each +pass:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan + +and how many milliseconds to wait in khugepaged between each pass (you +can set this to 0 to run khugepaged at 100% utilization of one core):: + + /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs + +and how many milliseconds to wait in khugepaged if there's an hugepage +allocation failure to throttle the next allocation attempt:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs + +The khugepaged progress can be seen in the number of pages collapsed:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed + +for each pass:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/full_scans + +``max_ptes_none`` specifies how many extra small pages (that are +not already mapped) can be allocated when collapsing a group +of small pages into one large page:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none + +A higher value leads to use additional memory for programs. +A lower value leads to gain less thp performance. Value of +max_ptes_none can waste cpu time very little, you can +ignore it. + +``max_ptes_swap`` specifies how many pages can be brought in from +swap when collapsing a group of pages into a transparent huge page:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap + +A higher value can cause excessive swap IO and waste +memory. A lower value can prevent THPs from being +collapsed, resulting fewer pages being collapsed into +THPs, and lower memory access performance. + +Boot parameter +============== + +You can change the sysfs boot time defaults of Transparent Hugepage +Support by passing the parameter ``transparent_hugepage=always`` or +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` +to the kernel command line. + +Hugepages in tmpfs/shmem +======================== + +You can control hugepage allocation policy in tmpfs with mount option +``huge=``. It can have following values: + +always + Attempt to allocate huge pages every time we need a new page; + +never + Do not allocate huge pages; + +within_size + Only allocate huge page if it will be fully within i_size. + Also respect fadvise()/madvise() hints; + +advise + Only allocate huge pages if requested with fadvise()/madvise(); + +The default policy is ``never``. + +``mount -o remount,huge= /mountpoint`` works fine after mount: remounting +``huge=never`` will not attempt to break up huge pages at all, just stop more +from being allocated. + +There's also sysfs knob to control hugepage allocation policy for internal +shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount +is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or +MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem. + +In addition to policies listed above, shmem_enabled allows two further +values: + +deny + For use in emergencies, to force the huge option off from + all mounts; +force + Force the huge option on for all - very useful for testing; + +Need of application restart +=========================== + +The transparent_hugepage/enabled values and tmpfs mount option only affect +future behavior. So to make them effective you need to restart any +application that could have been using hugepages. This also applies to the +regions registered in khugepaged. + +Monitoring usage +================ + +The number of anonymous transparent huge pages currently used by the +system is available by reading the AnonHugePages field in ``/proc/meminfo``. +To identify what applications are using anonymous transparent huge pages, +it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages fields +for each mapping. + +The number of file transparent huge pages mapped to userspace is available +by reading ShmemPmdMapped and ShmemHugePages fields in ``/proc/meminfo``. +To identify what applications are mapping file transparent huge pages, it +is necessary to read ``/proc/PID/smaps`` and count the FileHugeMapped fields +for each mapping. + +Note that reading the smaps file is expensive and reading it +frequently will incur overhead. + +There are a number of counters in ``/proc/vmstat`` that may be used to +monitor how successfully the system is providing huge pages for use. + +thp_fault_alloc + is incremented every time a huge page is successfully + allocated to handle a page fault. This applies to both the + first time a page is faulted and for COW faults. + +thp_collapse_alloc + is incremented by khugepaged when it has found + a range of pages to collapse into one huge page and has + successfully allocated a new huge page to store the data. + +thp_fault_fallback + is incremented if a page fault fails to allocate + a huge page and instead falls back to using small pages. + +thp_collapse_alloc_failed + is incremented if khugepaged found a range + of pages that should be collapsed into one huge page but failed + the allocation. + +thp_file_alloc + is incremented every time a file huge page is successfully + allocated. + +thp_file_mapped + is incremented every time a file huge page is mapped into + user address space. + +thp_split_page + is incremented every time a huge page is split into base + pages. This can happen for a variety of reasons but a common + reason is that a huge page is old and is being reclaimed. + This action implies splitting all PMD the page mapped with. + +thp_split_page_failed + is incremented if kernel fails to split huge + page. This can happen if the page was pinned by somebody. + +thp_deferred_split_page + is incremented when a huge page is put onto split + queue. This happens when a huge page is partially unmapped and + splitting it would free up some memory. Pages on split queue are + going to be split under memory pressure. + +thp_split_pmd + is incremented every time a PMD split into table of PTEs. + This can happen, for instance, when application calls mprotect() or + munmap() on part of huge page. It doesn't split huge page, only + page table entry. + +thp_zero_page_alloc + is incremented every time a huge zero page is + successfully allocated. It includes allocations which where + dropped due race with other allocation. Note, it doesn't count + every map of the huge zero page, only its allocation. + +thp_zero_page_alloc_failed + is incremented if kernel fails to allocate + huge zero page and falls back to using small pages. + +thp_swpout + is incremented every time a huge page is swapout in one + piece without splitting. + +thp_swpout_fallback + is incremented if a huge page has to be split before swapout. + Usually because failed to allocate some continuous swap space + for the huge page. + +As the system ages, allocating huge pages may be expensive as the +system uses memory compaction to copy data around memory to free a +huge page for use. There are some counters in ``/proc/vmstat`` to help +monitor this overhead. + +compact_stall + is incremented every time a process stalls to run + memory compaction so that a huge page is free for use. + +compact_success + is incremented if the system compacted memory and + freed a huge page for use. + +compact_fail + is incremented if the system tries to compact memory + but failed. + +compact_pages_moved + is incremented each time a page is moved. If + this value is increasing rapidly, it implies that the system + is copying a lot of data to satisfy the huge page allocation. + It is possible that the cost of copying exceeds any savings + from reduced TLB misses. + +compact_pagemigrate_failed + is incremented when the underlying mechanism + for moving a page failed. + +compact_blocks_moved + is incremented each time memory compaction examines + a huge page aligned range of pages. + +It is possible to establish how long the stalls were using the function +tracer to record how long was spent in __alloc_pages_nodemask and +using the mm_page_alloc tracepoint to identify which allocations were +for huge pages. + +Optimizing the applications +=========================== + +To be guaranteed that the kernel will map a 2M page immediately in any +memory region, the mmap region has to be hugepage naturally +aligned. posix_memalign() can provide that guarantee. + +Hugetlbfs +========= + +You can use hugetlbfs on a kernel that has transparent hugepage +support enabled just fine as always. No difference can be noted in +hugetlbfs other than there will be less overall fragmentation. All +usual features belonging to hugetlbfs are preserved and +unaffected. libhugetlbfs will also work fine as usual. diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst new file mode 100644 index 0000000000000000000000000000000000000000..5048cf661a8ae72e9d04a066d574922f92facf22 --- /dev/null +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -0,0 +1,241 @@ +.. _userfaultfd: + +=========== +Userfaultfd +=========== + +Objective +========= + +Userfaults allow the implementation of on-demand paging from userland +and more generally they allow userland to take control of various +memory page faults, something otherwise only the kernel code could do. + +For example userfaults allows a proper and more optimal implementation +of the PROT_NONE+SIGSEGV trick. + +Design +====== + +Userfaults are delivered and resolved through the userfaultfd syscall. + +The userfaultfd (aside from registering and unregistering virtual +memory ranges) provides two primary functionalities: + +1) read/POLLIN protocol to notify a userland thread of the faults + happening + +2) various UFFDIO_* ioctls that can manage the virtual memory regions + registered in the userfaultfd that allows userland to efficiently + resolve the userfaults it receives via 1) or to manage the virtual + memory in the background + +The real advantage of userfaults if compared to regular virtual memory +management of mremap/mprotect is that the userfaults in all their +operations never involve heavyweight structures like vmas (in fact the +userfaultfd runtime load never takes the mmap_sem for writing). + +Vmas are not suitable for page- (or hugepage) granular fault tracking +when dealing with virtual address spaces that could span +Terabytes. Too many vmas would be needed for that. + +The userfaultfd once opened by invoking the syscall, can also be +passed using unix domain sockets to a manager process, so the same +manager process could handle the userfaults of a multitude of +different processes without them being aware about what is going on +(well of course unless they later try to use the userfaultfd +themselves on the same region the manager is already tracking, which +is a corner case that would currently return -EBUSY). + +API +=== + +When first opened the userfaultfd must be enabled invoking the +UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or +a later API version) which will specify the read/POLLIN protocol +userland intends to speak on the UFFD and the uffdio_api.features +userland requires. The UFFDIO_API ioctl if successful (i.e. if the +requested uffdio_api.api is spoken also by the running kernel and the +requested features are going to be enabled) will return into +uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of +respectively all the available features of the read(2) protocol and +the generic ioctl available. + +The uffdio_api.features bitmask returned by the UFFDIO_API ioctl +defines what memory types are supported by the userfaultfd and what +events, except page fault notifications, may be generated. + +If the kernel supports registering userfaultfd ranges on hugetlbfs +virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in +uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be +set if the kernel supports registering userfaultfd ranges on shared +memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero +MAP_SHARED, memfd_create, etc). + +The userland application that wants to use userfaultfd with hugetlbfs +or shared memory need to set the corresponding flag in +uffdio_api.features to enable those features. + +If the userland desires to receive notifications for events other than +page faults, it has to verify that uffdio_api.features has appropriate +UFFD_FEATURE_EVENT_* bits set. These events are described in more +detail below in "Non-cooperative userfaultfd" section. + +Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should +be invoked (if present in the returned uffdio_api.ioctls bitmask) to +register a memory range in the userfaultfd by setting the +uffdio_register structure accordingly. The uffdio_register.mode +bitmask will specify to the kernel which kind of faults to track for +the range (UFFDIO_REGISTER_MODE_MISSING would track missing +pages). The UFFDIO_REGISTER ioctl will return the +uffdio_register.ioctls bitmask of ioctls that are suitable to resolve +userfaults on the range registered. Not all ioctls will necessarily be +supported for all memory types depending on the underlying virtual +memory backend (anonymous memory vs tmpfs vs real filebacked +mappings). + +Userland can use the uffdio_register.ioctls to manage the virtual +address space in the background (to add or potentially also remove +memory from the userfaultfd registered range). This means a userfault +could be triggering just before userland maps in the background the +user-faulted page. + +The primary ioctl to resolve userfaults is UFFDIO_COPY. That +atomically copies a page into the userfault registered range and wakes +up the blocked userfaults (unless uffdio_copy.mode & +UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to +UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an +half copied page since it'll keep userfaulting until the copy has +finished. + +QEMU/KVM +======== + +QEMU/KVM is using the userfaultfd syscall to implement postcopy live +migration. Postcopy live migration is one form of memory +externalization consisting of a virtual machine running with part or +all of its memory residing on a different node in the cloud. The +userfaultfd abstraction is generic enough that not a single line of +KVM kernel code had to be modified in order to add postcopy live +migration to QEMU. + +Guest async page faults, FOLL_NOWAIT and all other GUP features work +just fine in combination with userfaults. Userfaults trigger async +page faults in the guest scheduler so those guest processes that +aren't waiting for userfaults (i.e. network bound) can keep running in +the guest vcpus. + +It is generally beneficial to run one pass of precopy live migration +just before starting postcopy live migration, in order to avoid +generating userfaults for readonly guest regions. + +The implementation of postcopy live migration currently uses one +single bidirectional socket but in the future two different sockets +will be used (to reduce the latency of the userfaults to the minimum +possible without having to decrease /proc/sys/net/ipv4/tcp_wmem). + +The QEMU in the source node writes all pages that it knows are missing +in the destination node, into the socket, and the migration thread of +the QEMU running in the destination node runs UFFDIO_COPY|ZEROPAGE +ioctls on the userfaultfd in order to map the received pages into the +guest (UFFDIO_ZEROCOPY is used if the source page was a zero page). + +A different postcopy thread in the destination node listens with +poll() to the userfaultfd in parallel. When a POLLIN event is +generated after a userfault triggers, the postcopy thread read() from +the userfaultfd and receives the fault address (or -EAGAIN in case the +userfault was already resolved and waken by a UFFDIO_COPY|ZEROPAGE run +by the parallel QEMU migration thread). + +After the QEMU postcopy thread (running in the destination node) gets +the userfault address it writes the information about the missing page +into the socket. The QEMU source node receives the information and +roughly "seeks" to that page address and continues sending all +remaining missing pages from that new page offset. Soon after that +(just the time to flush the tcp_wmem queue through the network) the +migration thread in the QEMU running in the destination node will +receive the page that triggered the userfault and it'll map it as +usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it +was spontaneously sent by the source or if it was an urgent page +requested through a userfault). + +By the time the userfaults start, the QEMU in the destination node +doesn't need to keep any per-page state bitmap relative to the live +migration around and a single per-page bitmap has to be maintained in +the QEMU running in the source node to know which pages are still +missing in the destination node. The bitmap in the source node is +checked to find which missing pages to send in round robin and we seek +over it when receiving incoming userfaults. After sending each page of +course the bitmap is updated accordingly. It's also useful to avoid +sending the same page twice (in case the userfault is read by the +postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration +thread). + +Non-cooperative userfaultfd +=========================== + +When the userfaultfd is monitored by an external manager, the manager +must be able to track changes in the process virtual memory +layout. Userfaultfd can notify the manager about such changes using +the same read(2) protocol as for the page fault notifications. The +manager has to explicitly enable these events by setting appropriate +bits in uffdio_api.features passed to UFFDIO_API ioctl: + +UFFD_FEATURE_EVENT_FORK + enable userfaultfd hooks for fork(). When this feature is + enabled, the userfaultfd context of the parent process is + duplicated into the newly created process. The manager + receives UFFD_EVENT_FORK with file descriptor of the new + userfaultfd context in the uffd_msg.fork. + +UFFD_FEATURE_EVENT_REMAP + enable notifications about mremap() calls. When the + non-cooperative process moves a virtual memory area to a + different location, the manager will receive + UFFD_EVENT_REMAP. The uffd_msg.remap will contain the old and + new addresses of the area and its original length. + +UFFD_FEATURE_EVENT_REMOVE + enable notifications about madvise(MADV_REMOVE) and + madvise(MADV_DONTNEED) calls. The event UFFD_EVENT_REMOVE will + be generated upon these calls to madvise. The uffd_msg.remove + will contain start and end addresses of the removed area. + +UFFD_FEATURE_EVENT_UNMAP + enable notifications about memory unmapping. The manager will + get UFFD_EVENT_UNMAP with uffd_msg.remove containing start and + end addresses of the unmapped area. + +Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP +are pretty similar, they quite differ in the action expected from the +userfaultfd manager. In the former case, the virtual memory is +removed, but the area is not, the area remains monitored by the +userfaultfd, and if a page fault occurs in that area it will be +delivered to the manager. The proper resolution for such page fault is +to zeromap the faulting address. However, in the latter case, when an +area is unmapped, either explicitly (with munmap() system call), or +implicitly (e.g. during mremap()), the area is removed and in turn the +userfaultfd context for such area disappears too and the manager will +not get further userland page faults from the removed area. Still, the +notification is required in order to prevent manager from using +UFFDIO_COPY on the unmapped area. + +Unlike userland page faults which have to be synchronous and require +explicit or implicit wakeup, all the events are delivered +asynchronously and the non-cooperative process resumes execution as +soon as manager executes read(). The userfaultfd manager should +carefully synchronize calls to UFFDIO_COPY with the events +processing. To aid the synchronization, the UFFDIO_COPY ioctl will +return -ENOSPC when the monitored process exits at the time of +UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed +its virtual memory layout simultaneously with outstanding UFFDIO_COPY +operation. + +The current asynchronous model of the event delivery is optimal for +single threaded non-cooperative userfaultfd manager implementations. A +synchronous event delivery model can be added later as a new +userfaultfd feature to facilitate multithreading enhancements of the +non cooperative manager, for example to allow UFFDIO_COPY ioctls to +run in parallel to the event reception. Single threaded +implementations should continue to use the current async event +delivery model instead. diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index ab2fe0eda1d7c317faefab52363ce96755ac64d5..8f1d3de449b53fedcc78d1aee506e6882f2be90c 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -324,8 +324,7 @@ Global Attributes ``intel_pstate`` exposes several global attributes (files) in ``sysfs`` to control its functionality at the system level. They are located in the -``/sys/devices/system/cpu/cpufreq/intel_pstate/`` directory and affect all -CPUs. +``/sys/devices/system/cpu/intel_pstate/`` directory and affect all CPUs. Some of them are not present if the ``intel_pstate=per_cpu_perf_limits`` argument is passed to the kernel in the command line. @@ -379,6 +378,17 @@ argument is passed to the kernel in the command line. but it affects the maximum possible value of per-policy P-state limits (see `Interpretation of Policy Attributes`_ below for details). +``hwp_dynamic_boost`` + This attribute is only present if ``intel_pstate`` works in the + `active mode with the HWP feature enabled `_ in + the processor. If set (equal to 1), it causes the minimum P-state limit + to be increased dynamically for a short time whenever a task previously + waiting on I/O is selected to run on a given logical CPU (the purpose + of this mechanism is to improve performance). + + This setting has no effect on logical CPUs whose minimum P-state limit + is directly set to the highest non-turbo P-state or above it. + .. _status_attr: ``status`` @@ -410,7 +420,7 @@ argument is passed to the kernel in the command line. That only is supported in some configurations, though (for example, if the `HWP feature is enabled in the processor `_, the operation mode of the driver cannot be changed), and if it is not - supported in the current configuration, writes to this attribute with + supported in the current configuration, writes to this attribute will fail with an appropriate error. Interpretation of Policy Attributes diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst index 4efd7ce7756536e3ddb4c9e2356996fd93683d50..6dbcc5481000a1c5ec3fb5864cf2d188985eeb8f 100644 --- a/Documentation/admin-guide/ramoops.rst +++ b/Documentation/admin-guide/ramoops.rst @@ -61,7 +61,7 @@ Setting the ramoops parameters can be done in several different manners: mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1 B. Use Device Tree bindings, as described in - ``Documentation/device-tree/bindings/reserved-memory/admin-guide/ramoops.rst``. + ``Documentation/devicetree/bindings/reserved-memory/ramoops.txt``. For example:: reserved-memory { diff --git a/Documentation/arm/Marvell/README b/Documentation/arm/Marvell/README index b5bb7f5188406f8b47a0b55d921512c06dcf817f..56ada27c53be6a3481bc3b57184aa98acf72faf5 100644 --- a/Documentation/arm/Marvell/README +++ b/Documentation/arm/Marvell/README @@ -302,19 +302,15 @@ Berlin family (Multimedia Solutions) 88DE3010, Armada 1000 (no Linux support) Core: Marvell PJ1 (ARMv5TE), Dual-core Product Brief: http://www.marvell.com.cn/digital-entertainment/assets/armada_1000_pb.pdf - 88DE3005, Armada 1500-mini 88DE3005, Armada 1500 Mini Design name: BG2CD Core: ARM Cortex-A9, PL310 L2CC - Homepage: http://www.marvell.com/multimedia-solutions/armada-1500-mini/ - 88DE3006, Armada 1500 Mini Plus - Design name: BG2CDP - Core: Dual Core ARM Cortex-A7 - Homepage: http://www.marvell.com/multimedia-solutions/armada-1500-mini-plus/ + 88DE3006, Armada 1500 Mini Plus + Design name: BG2CDP + Core: Dual Core ARM Cortex-A7 88DE3100, Armada 1500 Design name: BG2 Core: Marvell PJ4B-MP (ARMv7), Tauros3 L2CC - Product Brief: http://www.marvell.com/digital-entertainment/armada-1500/assets/Marvell-ARMADA-1500-Product-Brief.pdf 88DE3114, Armada 1500 Pro Design name: BG2Q Core: Quad Core ARM Cortex-A9, PL310 L2CC @@ -324,13 +320,16 @@ Berlin family (Multimedia Solutions) 88DE3218, ARMADA 1500 Ultra Core: ARM Cortex-A53 - Homepage: http://www.marvell.com/multimedia-solutions/ + Homepage: https://www.synaptics.com/products/multimedia-solutions Directory: arch/arm/mach-berlin Comments: + * This line of SoCs is based on Marvell Sheeva or ARM Cortex CPUs with Synopsys DesignWare (IRQ, GPIO, Timers, ...) and PXA IP (SDHCI, USB, ETH, ...). + * The Berlin family was acquired by Synaptics from Marvell in 2017. + CPU Cores --------- diff --git a/Documentation/arm/OMAP/README b/Documentation/arm/OMAP/README index 75645c45d14a17a56a83a6bfde987783794852ee..90c6c57d61e8d459f6e9990af37d839e41ae9a0e 100644 --- a/Documentation/arm/OMAP/README +++ b/Documentation/arm/OMAP/README @@ -5,3 +5,7 @@ KERNEL NEW DEPENDENCIES v4.3+ Update is needed for custom .config files to make sure CONFIG_REGULATOR_PBIAS is enabled for MMC1 to work properly. + +v4.18+ Update is needed for custom .config files to make sure + CONFIG_MMC_SDHCI_OMAP is enabled for all MMC instances + to work in DRA7 and K2G based boards. diff --git a/Documentation/misc-devices/lcd-panel-cgram.txt b/Documentation/auxdisplay/lcd-panel-cgram.txt similarity index 100% rename from Documentation/misc-devices/lcd-panel-cgram.txt rename to Documentation/auxdisplay/lcd-panel-cgram.txt diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 86927029a52db81ed81547fac93b09a6ffdd08da..207eca58efaaa2c652d63412b115acd0ec36a32f 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -752,18 +752,6 @@ completion of the request to the block layer. This means ending tag operations before calling end_that_request_last()! For an example of a user of these helpers, see the IDE tagged command queueing support. -Certain hardware conditions may dictate a need to invalidate the block tag -queue. For instance, on IDE any tagged request error needs to clear both -the hardware and software block queue and enable the driver to sanely restart -all the outstanding requests. There's a third helper to do that: - - blk_queue_invalidate_tags(struct request_queue *q) - - Clear the internal block tag queue and re-add all the pending requests - to the request queue. The driver will receive them again on the - next request_fn run, just like it did the first time it encountered - them. - 3.2.5.2 Tag info Some block functions exist to query current tag status or to go from a @@ -805,8 +793,7 @@ Internally, block manages tags in the blk_queue_tag structure: Most of the above is simple and straight forward, however busy_list may need a bit of explaining. Normally we don't care too much about request ordering, but in the event of any barrier requests in the tag queue we need to ensure -that requests are restarted in the order they were queue. This may happen -if the driver needs to use blk_queue_invalidate_tags(). +that requests are restarted in the order they were queue. 3.3 I/O Submission diff --git a/Documentation/block/cmdline-partition.txt b/Documentation/block/cmdline-partition.txt index 525b9f6d7fb49e4c9bbd5f7ea76833dcde640e59..760a3f7c3ed4ef6d2cc9bdbac01a3709e328e9e5 100644 --- a/Documentation/block/cmdline-partition.txt +++ b/Documentation/block/cmdline-partition.txt @@ -1,7 +1,9 @@ Embedded device command line partition parsing ===================================================================== -Support for reading the block device partition table from the command line. +The "blkdevparts" command line option adds support for reading the +block device partition table from the kernel command line. + It is typically used for fixed block (eMMC) embedded devices. It has no MBR, so saves storage space. Bootloader can be easily accessed by absolute address of data on the block device. @@ -14,22 +16,27 @@ blkdevparts=[;] := [@](part-name) - block device disk name, embedded device used fixed block device, - it's disk name also fixed. such as: mmcblk0, mmcblk1, mmcblk0boot0. + block device disk name. Embedded device uses fixed block device. + Its disk name is also fixed, such as: mmcblk0, mmcblk1, mmcblk0boot0. partition size, in bytes, such as: 512, 1m, 1G. + size may contain an optional suffix of (upper or lower case): + K, M, G, T, P, E. + "-" is used to denote all remaining space. partition start address, in bytes. + offset may contain an optional suffix of (upper or lower case): + K, M, G, T, P, E. (part-name) - partition name, kernel send uevent with "PARTNAME". application can create - a link to block device partition with the name "PARTNAME". - user space application can access partition by partition name. + partition name. Kernel sends uevent with "PARTNAME". Application can + create a link to block device partition with the name "PARTNAME". + User space application can access partition by partition name. Example: - eMMC disk name is "mmcblk0" and "mmcblk0boot0" + eMMC disk names are "mmcblk0" and "mmcblk0boot0". bootargs: 'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)' diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt index 733927a7b501a5f93b518db8417db181b0e8bbb7..07f147381f3270e9209f8ab29a236d8fdda32352 100644 --- a/Documentation/block/null_blk.txt +++ b/Documentation/block/null_blk.txt @@ -71,13 +71,16 @@ use_per_node_hctx=[0/1]: Default: 0 1: The multi-queue block layer is instantiated with a hardware dispatch queue for each CPU node in the system. -use_lightnvm=[0/1]: Default: 0 - Register device with LightNVM. Requires blk-mq and CONFIG_NVM to be enabled. - no_sched=[0/1]: Default: 0 0: nullb* use default blk-mq io scheduler. 1: nullb* doesn't use io scheduler. +blocking=[0/1]: Default: 0 + 0: Register as a non-blocking blk-mq driver device. + 1: Register as a blocking blk-mq driver device, null_blk will set + the BLK_MQ_F_BLOCKING flag, indicating that it sometimes/always + needs to block in its ->queue_rq() function. + shared_tags=[0/1]: Default: 0 0: Tag set is not shared. 1: Tag set shared between devices for blk-mq. Only makes sense with diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 257e65714c6a216f3fce9ebce7398eb821f96176..875b2b56b87fc88131324bb10bbecc8594e02109 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -218,6 +218,7 @@ line of text and contains the following stats separated by whitespace: same_pages the number of same element filled pages written to this disk. No memory is allocated for such pages. pages_compacted the number of pages freed during compaction + huge_pages the number of incompressible pages 9) Deactivate: swapoff /dev/zram0 @@ -242,5 +243,29 @@ to backing storage rather than keeping it in memory. User should set up backing device via /sys/block/zramX/backing_dev before disksize setting. += memory tracking + +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the +zram block. It could be useful to catch cold or incompressible +pages of the process with*pagemap. +If you enable the feature, you could see block state via +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, + + 300 75.033841 .wh + 301 63.806904 s.. + 302 63.806919 ..h + +First column is zram's block index. +Second column is access time since the system was booted +Third column is state of the block. +(s: same page +w: written page to backing store +h: huge page) + +First line of above example says 300th block is accessed at 75.033841sec +and the block's state is huge so it is written back to the backing +storage. It's a debugging feature so anyone shouldn't rely on it to work +properly. + Nitin Gupta ngupta@vflare.org diff --git a/Documentation/bpf/README.rst b/Documentation/bpf/README.rst new file mode 100644 index 0000000000000000000000000000000000000000..b9a80c9e9392cc096fe0c0453c27a21331a93b66 --- /dev/null +++ b/Documentation/bpf/README.rst @@ -0,0 +1,36 @@ +================= +BPF documentation +================= + +This directory contains documentation for the BPF (Berkeley Packet +Filter) facility, with a focus on the extended BPF version (eBPF). + +This kernel side documentation is still work in progress. The main +textual documentation is (for historical reasons) described in +`Documentation/networking/filter.txt`_, which describe both classical +and extended BPF instruction-set. +The Cilium project also maintains a `BPF and XDP Reference Guide`_ +that goes into great technical depth about the BPF Architecture. + +The primary info for the bpf syscall is available in the `man-pages`_ +for `bpf(2)`_. + + + +Frequently asked questions (FAQ) +================================ + +Two sets of Questions and Answers (Q&A) are maintained. + +* QA for common questions about BPF see: bpf_design_QA_ + +* QA for developers interacting with BPF subsystem: bpf_devel_QA_ + + +.. Links: +.. _bpf_design_QA: bpf_design_QA.rst +.. _bpf_devel_QA: bpf_devel_QA.rst +.. _Documentation/networking/filter.txt: ../networking/filter.txt +.. _man-pages: https://www.kernel.org/doc/man-pages/ +.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html +.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/ diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst new file mode 100644 index 0000000000000000000000000000000000000000..6780a6d8174580ea1caeac4a13fb4f8dae6bd91b --- /dev/null +++ b/Documentation/bpf/bpf_design_QA.rst @@ -0,0 +1,221 @@ +============== +BPF Design Q&A +============== + +BPF extensibility and applicability to networking, tracing, security +in the linux kernel and several user space implementations of BPF +virtual machine led to a number of misunderstanding on what BPF actually is. +This short QA is an attempt to address that and outline a direction +of where BPF is heading long term. + +.. contents:: + :local: + :depth: 3 + +Questions and Answers +===================== + +Q: Is BPF a generic instruction set similar to x64 and arm64? +------------------------------------------------------------- +A: NO. + +Q: Is BPF a generic virtual machine ? +------------------------------------- +A: NO. + +BPF is generic instruction set *with* C calling convention. +----------------------------------------------------------- + +Q: Why C calling convention was chosen? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A: Because BPF programs are designed to run in the linux kernel +which is written in C, hence BPF defines instruction set compatible +with two most used architectures x64 and arm64 (and takes into +consideration important quirks of other architectures) and +defines calling convention that is compatible with C calling +convention of the linux kernel on those architectures. + +Q: can multiple return values be supported in the future? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A: NO. BPF allows only register R0 to be used as return value. + +Q: can more than 5 function arguments be supported in the future? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A: NO. BPF calling convention only allows registers R1-R5 to be used +as arguments. BPF is not a standalone instruction set. +(unlike x64 ISA that allows msft, cdecl and other conventions) + +Q: can BPF programs access instruction pointer or return address? +----------------------------------------------------------------- +A: NO. + +Q: can BPF programs access stack pointer ? +------------------------------------------ +A: NO. + +Only frame pointer (register R10) is accessible. +From compiler point of view it's necessary to have stack pointer. +For example LLVM defines register R11 as stack pointer in its +BPF backend, but it makes sure that generated code never uses it. + +Q: Does C-calling convention diminishes possible use cases? +----------------------------------------------------------- +A: YES. + +BPF design forces addition of major functionality in the form +of kernel helper functions and kernel objects like BPF maps with +seamless interoperability between them. It lets kernel call into +BPF programs and programs call kernel helpers with zero overhead. +As all of them were native C code. That is particularly the case +for JITed BPF programs that are indistinguishable from +native kernel C code. + +Q: Does it mean that 'innovative' extensions to BPF code are disallowed? +------------------------------------------------------------------------ +A: Soft yes. + +At least for now until BPF core has support for +bpf-to-bpf calls, indirect calls, loops, global variables, +jump tables, read only sections and all other normal constructs +that C code can produce. + +Q: Can loops be supported in a safe way? +---------------------------------------- +A: It's not clear yet. + +BPF developers are trying to find a way to +support bounded loops where the verifier can guarantee that +the program terminates in less than 4096 instructions. + +Instruction level questions +--------------------------- + +Q: LD_ABS and LD_IND instructions vs C code +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Q: How come LD_ABS and LD_IND instruction are present in BPF whereas +C code cannot express them and has to use builtin intrinsics? + +A: This is artifact of compatibility with classic BPF. Modern +networking code in BPF performs better without them. +See 'direct packet access'. + +Q: BPF instructions mapping not one-to-one to native CPU +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Q: It seems not all BPF instructions are one-to-one to native CPU. +For example why BPF_JNE and other compare and jumps are not cpu-like? + +A: This was necessary to avoid introducing flags into ISA which are +impossible to make generic and efficient across CPU architectures. + +Q: why BPF_DIV instruction doesn't map to x64 div? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A: Because if we picked one-to-one relationship to x64 it would have made +it more complicated to support on arm64 and other archs. Also it +needs div-by-zero runtime check. + +Q: why there is no BPF_SDIV for signed divide operation? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A: Because it would be rarely used. llvm errors in such case and +prints a suggestion to use unsigned divide instead + +Q: Why BPF has implicit prologue and epilogue? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A: Because architectures like sparc have register windows and in general +there are enough subtle differences between architectures, so naive +store return address into stack won't work. Another reason is BPF has +to be safe from division by zero (and legacy exception path +of LD_ABS insn). Those instructions need to invoke epilogue and +return implicitly. + +Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A: Because classic BPF didn't have them and BPF authors felt that compiler +workaround would be acceptable. Turned out that programs lose performance +due to lack of these compare instructions and they were added. +These two instructions is a perfect example what kind of new BPF +instructions are acceptable and can be added in the future. +These two already had equivalent instructions in native CPUs. +New instructions that don't have one-to-one mapping to HW instructions +will not be accepted. + +Q: BPF 32-bit subregister requirements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF +registers which makes BPF inefficient virtual machine for 32-bit +CPU architectures and 32-bit HW accelerators. Can true 32-bit registers +be added to BPF in the future? + +A: NO. The first thing to improve performance on 32-bit archs is to teach +LLVM to generate code that uses 32-bit subregisters. Then second step +is to teach verifier to mark operations where zero-ing upper bits +is unnecessary. Then JITs can take advantage of those markings and +drastically reduce size of generated code and improve performance. + +Q: Does BPF have a stable ABI? +------------------------------ +A: YES. BPF instructions, arguments to BPF programs, set of helper +functions and their arguments, recognized return codes are all part +of ABI. However when tracing programs are using bpf_probe_read() helper +to walk kernel internal datastructures and compile with kernel +internal headers these accesses can and will break with newer +kernels. The union bpf_attr -> kern_version is checked at load time +to prevent accidentally loading kprobe-based bpf programs written +for a different kernel. Networking programs don't do kern_version check. + +Q: How much stack space a BPF program uses? +------------------------------------------- +A: Currently all program types are limited to 512 bytes of stack +space, but the verifier computes the actual amount of stack used +and both interpreter and most JITed code consume necessary amount. + +Q: Can BPF be offloaded to HW? +------------------------------ +A: YES. BPF HW offload is supported by NFP driver. + +Q: Does classic BPF interpreter still exist? +-------------------------------------------- +A: NO. Classic BPF programs are converted into extend BPF instructions. + +Q: Can BPF call arbitrary kernel functions? +------------------------------------------- +A: NO. BPF programs can only call a set of helper functions which +is defined for every program type. + +Q: Can BPF overwrite arbitrary kernel memory? +--------------------------------------------- +A: NO. + +Tracing bpf programs can *read* arbitrary memory with bpf_probe_read() +and bpf_probe_read_str() helpers. Networking programs cannot read +arbitrary memory, since they don't have access to these helpers. +Programs can never read or write arbitrary memory directly. + +Q: Can BPF overwrite arbitrary user memory? +------------------------------------------- +A: Sort-of. + +Tracing BPF programs can overwrite the user memory +of the current task with bpf_probe_write_user(). Every time such +program is loaded the kernel will print warning message, so +this helper is only useful for experiments and prototypes. +Tracing BPF programs are root only. + +Q: bpf_trace_printk() helper warning +------------------------------------ +Q: When bpf_trace_printk() helper is used the kernel prints nasty +warning message. Why is that? + +A: This is done to nudge program authors into better interfaces when +programs need to pass data to user space. Like bpf_perf_event_output() +can be used to efficiently stream data via perf ring buffer. +BPF maps can be used for asynchronous data sharing between kernel +and user space. bpf_trace_printk() should only be used for debugging. + +Q: New functionality via kernel modules? +---------------------------------------- +Q: Can BPF functionality such as new program or map types, new +helpers, etc be added out of kernel module code? + +A: NO. diff --git a/Documentation/bpf/bpf_design_QA.txt b/Documentation/bpf/bpf_design_QA.txt deleted file mode 100644 index f3e458a0bb2f1f26e376239ffc4200b13294debf..0000000000000000000000000000000000000000 --- a/Documentation/bpf/bpf_design_QA.txt +++ /dev/null @@ -1,156 +0,0 @@ -BPF extensibility and applicability to networking, tracing, security -in the linux kernel and several user space implementations of BPF -virtual machine led to a number of misunderstanding on what BPF actually is. -This short QA is an attempt to address that and outline a direction -of where BPF is heading long term. - -Q: Is BPF a generic instruction set similar to x64 and arm64? -A: NO. - -Q: Is BPF a generic virtual machine ? -A: NO. - -BPF is generic instruction set _with_ C calling convention. - -Q: Why C calling convention was chosen? -A: Because BPF programs are designed to run in the linux kernel - which is written in C, hence BPF defines instruction set compatible - with two most used architectures x64 and arm64 (and takes into - consideration important quirks of other architectures) and - defines calling convention that is compatible with C calling - convention of the linux kernel on those architectures. - -Q: can multiple return values be supported in the future? -A: NO. BPF allows only register R0 to be used as return value. - -Q: can more than 5 function arguments be supported in the future? -A: NO. BPF calling convention only allows registers R1-R5 to be used - as arguments. BPF is not a standalone instruction set. - (unlike x64 ISA that allows msft, cdecl and other conventions) - -Q: can BPF programs access instruction pointer or return address? -A: NO. - -Q: can BPF programs access stack pointer ? -A: NO. Only frame pointer (register R10) is accessible. - From compiler point of view it's necessary to have stack pointer. - For example LLVM defines register R11 as stack pointer in its - BPF backend, but it makes sure that generated code never uses it. - -Q: Does C-calling convention diminishes possible use cases? -A: YES. BPF design forces addition of major functionality in the form - of kernel helper functions and kernel objects like BPF maps with - seamless interoperability between them. It lets kernel call into - BPF programs and programs call kernel helpers with zero overhead. - As all of them were native C code. That is particularly the case - for JITed BPF programs that are indistinguishable from - native kernel C code. - -Q: Does it mean that 'innovative' extensions to BPF code are disallowed? -A: Soft yes. At least for now until BPF core has support for - bpf-to-bpf calls, indirect calls, loops, global variables, - jump tables, read only sections and all other normal constructs - that C code can produce. - -Q: Can loops be supported in a safe way? -A: It's not clear yet. BPF developers are trying to find a way to - support bounded loops where the verifier can guarantee that - the program terminates in less than 4096 instructions. - -Q: How come LD_ABS and LD_IND instruction are present in BPF whereas - C code cannot express them and has to use builtin intrinsics? -A: This is artifact of compatibility with classic BPF. Modern - networking code in BPF performs better without them. - See 'direct packet access'. - -Q: It seems not all BPF instructions are one-to-one to native CPU. - For example why BPF_JNE and other compare and jumps are not cpu-like? -A: This was necessary to avoid introducing flags into ISA which are - impossible to make generic and efficient across CPU architectures. - -Q: why BPF_DIV instruction doesn't map to x64 div? -A: Because if we picked one-to-one relationship to x64 it would have made - it more complicated to support on arm64 and other archs. Also it - needs div-by-zero runtime check. - -Q: why there is no BPF_SDIV for signed divide operation? -A: Because it would be rarely used. llvm errors in such case and - prints a suggestion to use unsigned divide instead - -Q: Why BPF has implicit prologue and epilogue? -A: Because architectures like sparc have register windows and in general - there are enough subtle differences between architectures, so naive - store return address into stack won't work. Another reason is BPF has - to be safe from division by zero (and legacy exception path - of LD_ABS insn). Those instructions need to invoke epilogue and - return implicitly. - -Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? -A: Because classic BPF didn't have them and BPF authors felt that compiler - workaround would be acceptable. Turned out that programs lose performance - due to lack of these compare instructions and they were added. - These two instructions is a perfect example what kind of new BPF - instructions are acceptable and can be added in the future. - These two already had equivalent instructions in native CPUs. - New instructions that don't have one-to-one mapping to HW instructions - will not be accepted. - -Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF - registers which makes BPF inefficient virtual machine for 32-bit - CPU architectures and 32-bit HW accelerators. Can true 32-bit registers - be added to BPF in the future? -A: NO. The first thing to improve performance on 32-bit archs is to teach - LLVM to generate code that uses 32-bit subregisters. Then second step - is to teach verifier to mark operations where zero-ing upper bits - is unnecessary. Then JITs can take advantage of those markings and - drastically reduce size of generated code and improve performance. - -Q: Does BPF have a stable ABI? -A: YES. BPF instructions, arguments to BPF programs, set of helper - functions and their arguments, recognized return codes are all part - of ABI. However when tracing programs are using bpf_probe_read() helper - to walk kernel internal datastructures and compile with kernel - internal headers these accesses can and will break with newer - kernels. The union bpf_attr -> kern_version is checked at load time - to prevent accidentally loading kprobe-based bpf programs written - for a different kernel. Networking programs don't do kern_version check. - -Q: How much stack space a BPF program uses? -A: Currently all program types are limited to 512 bytes of stack - space, but the verifier computes the actual amount of stack used - and both interpreter and most JITed code consume necessary amount. - -Q: Can BPF be offloaded to HW? -A: YES. BPF HW offload is supported by NFP driver. - -Q: Does classic BPF interpreter still exist? -A: NO. Classic BPF programs are converted into extend BPF instructions. - -Q: Can BPF call arbitrary kernel functions? -A: NO. BPF programs can only call a set of helper functions which - is defined for every program type. - -Q: Can BPF overwrite arbitrary kernel memory? -A: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read() - and bpf_probe_read_str() helpers. Networking programs cannot read - arbitrary memory, since they don't have access to these helpers. - Programs can never read or write arbitrary memory directly. - -Q: Can BPF overwrite arbitrary user memory? -A: Sort-of. Tracing BPF programs can overwrite the user memory - of the current task with bpf_probe_write_user(). Every time such - program is loaded the kernel will print warning message, so - this helper is only useful for experiments and prototypes. - Tracing BPF programs are root only. - -Q: When bpf_trace_printk() helper is used the kernel prints nasty - warning message. Why is that? -A: This is done to nudge program authors into better interfaces when - programs need to pass data to user space. Like bpf_perf_event_output() - can be used to efficiently stream data via perf ring buffer. - BPF maps can be used for asynchronous data sharing between kernel - and user space. bpf_trace_printk() should only be used for debugging. - -Q: Can BPF functionality such as new program or map types, new - helpers, etc be added out of kernel module code? -A: NO. diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst new file mode 100644 index 0000000000000000000000000000000000000000..0e7c1d946e83818b32eb63fe0f4999c8aec15352 --- /dev/null +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -0,0 +1,640 @@ +================================= +HOWTO interact with BPF subsystem +================================= + +This document provides information for the BPF subsystem about various +workflows related to reporting bugs, submitting patches, and queueing +patches for stable kernels. + +For general information about submitting patches, please refer to +`Documentation/process/`_. This document only describes additional specifics +related to BPF. + +.. contents:: + :local: + :depth: 2 + +Reporting bugs +============== + +Q: How do I report bugs for BPF kernel code? +-------------------------------------------- +A: Since all BPF kernel development as well as bpftool and iproute2 BPF +loader development happens through the netdev kernel mailing list, +please report any found issues around BPF to the following mailing +list: + + netdev@vger.kernel.org + +This may also include issues related to XDP, BPF tracing, etc. + +Given netdev has a high volume of traffic, please also add the BPF +maintainers to Cc (from kernel MAINTAINERS_ file): + +* Alexei Starovoitov +* Daniel Borkmann + +In case a buggy commit has already been identified, make sure to keep +the actual commit authors in Cc as well for the report. They can +typically be identified through the kernel's git tree. + +**Please do NOT report BPF issues to bugzilla.kernel.org since it +is a guarantee that the reported issue will be overlooked.** + +Submitting patches +================== + +Q: To which mailing list do I need to submit my BPF patches? +------------------------------------------------------------ +A: Please submit your BPF patches to the netdev kernel mailing list: + + netdev@vger.kernel.org + +Historically, BPF came out of networking and has always been maintained +by the kernel networking community. Although these days BPF touches +many other subsystems as well, the patches are still routed mainly +through the networking community. + +In case your patch has changes in various different subsystems (e.g. +tracing, security, etc), make sure to Cc the related kernel mailing +lists and maintainers from there as well, so they are able to review +the changes and provide their Acked-by's to the patches. + +Q: Where can I find patches currently under discussion for BPF subsystem? +------------------------------------------------------------------------- +A: All patches that are Cc'ed to netdev are queued for review under netdev +patchwork project: + + http://patchwork.ozlabs.org/project/netdev/list/ + +Those patches which target BPF, are assigned to a 'bpf' delegate for +further processing from BPF maintainers. The current queue with +patches under review can be found at: + + https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 + +Once the patches have been reviewed by the BPF community as a whole +and approved by the BPF maintainers, their status in patchwork will be +changed to 'Accepted' and the submitter will be notified by mail. This +means that the patches look good from a BPF perspective and have been +applied to one of the two BPF kernel trees. + +In case feedback from the community requires a respin of the patches, +their status in patchwork will be set to 'Changes Requested', and purged +from the current review queue. Likewise for cases where patches would +get rejected or are not applicable to the BPF trees (but assigned to +the 'bpf' delegate). + +Q: How do the changes make their way into Linux? +------------------------------------------------ +A: There are two BPF kernel trees (git repositories). Once patches have +been accepted by the BPF maintainers, they will be applied to one +of the two BPF trees: + + * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/ + * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ + +The bpf tree itself is for fixes only, whereas bpf-next for features, +cleanups or other kind of improvements ("next-like" content). This is +analogous to net and net-next trees for networking. Both bpf and +bpf-next will only have a master branch in order to simplify against +which branch patches should get rebased to. + +Accumulated BPF patches in the bpf tree will regularly get pulled +into the net kernel tree. Likewise, accumulated BPF patches accepted +into the bpf-next tree will make their way into net-next tree. net and +net-next are both run by David S. Miller. From there, they will go +into the kernel mainline tree run by Linus Torvalds. To read up on the +process of net and net-next being merged into the mainline tree, see +the `netdev FAQ`_ under: + + `Documentation/networking/netdev-FAQ.txt`_ + +Occasionally, to prevent merge conflicts, we might send pull requests +to other trees (e.g. tracing) with a small subset of the patches, but +net and net-next are always the main trees targeted for integration. + +The pull requests will contain a high-level summary of the accumulated +patches and can be searched on netdev kernel mailing list through the +following subject lines (``yyyy-mm-dd`` is the date of the pull +request):: + + pull-request: bpf yyyy-mm-dd + pull-request: bpf-next yyyy-mm-dd + +Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to? +--------------------------------------------------------------------------------- + +A: The process is the very same as described in the `netdev FAQ`_, so +please read up on it. The subject line must indicate whether the +patch is a fix or rather "next-like" content in order to let the +maintainers know whether it is targeted at bpf or bpf-next. + +For fixes eventually landing in bpf -> net tree, the subject must +look like:: + + git format-patch --subject-prefix='PATCH bpf' start..finish + +For features/improvements/etc that should eventually land in +bpf-next -> net-next, the subject must look like:: + + git format-patch --subject-prefix='PATCH bpf-next' start..finish + +If unsure whether the patch or patch series should go into bpf +or net directly, or bpf-next or net-next directly, it is not a +problem either if the subject line says net or net-next as target. +It is eventually up to the maintainers to do the delegation of +the patches. + +If it is clear that patches should go into bpf or bpf-next tree, +please make sure to rebase the patches against those trees in +order to reduce potential conflicts. + +In case the patch or patch series has to be reworked and sent out +again in a second or later revision, it is also required to add a +version number (``v2``, ``v3``, ...) into the subject prefix:: + + git format-patch --subject-prefix='PATCH net-next v2' start..finish + +When changes have been requested to the patch series, always send the +whole patch series again with the feedback incorporated (never send +individual diffs on top of the old series). + +Q: What does it mean when a patch gets applied to bpf or bpf-next tree? +----------------------------------------------------------------------- +A: It means that the patch looks good for mainline inclusion from +a BPF point of view. + +Be aware that this is not a final verdict that the patch will +automatically get accepted into net or net-next trees eventually: + +On the netdev kernel mailing list reviews can come in at any point +in time. If discussions around a patch conclude that they cannot +get included as-is, we will either apply a follow-up fix or drop +them from the trees entirely. Therefore, we also reserve to rebase +the trees when deemed necessary. After all, the purpose of the tree +is to: + +i) accumulate and stage BPF patches for integration into trees + like net and net-next, and + +ii) run extensive BPF test suite and + workloads on the patches before they make their way any further. + +Once the BPF pull request was accepted by David S. Miller, then +the patches end up in net or net-next tree, respectively, and +make their way from there further into mainline. Again, see the +`netdev FAQ`_ for additional information e.g. on how often they are +merged to mainline. + +Q: How long do I need to wait for feedback on my BPF patches? +------------------------------------------------------------- +A: We try to keep the latency low. The usual time to feedback will +be around 2 or 3 business days. It may vary depending on the +complexity of changes and current patch load. + +Q: How often do you send pull requests to major kernel trees like net or net-next? +---------------------------------------------------------------------------------- + +A: Pull requests will be sent out rather often in order to not +accumulate too many patches in bpf or bpf-next. + +As a rule of thumb, expect pull requests for each tree regularly +at the end of the week. In some cases pull requests could additionally +come also in the middle of the week depending on the current patch +load or urgency. + +Q: Are patches applied to bpf-next when the merge window is open? +----------------------------------------------------------------- +A: For the time when the merge window is open, bpf-next will not be +processed. This is roughly analogous to net-next patch processing, +so feel free to read up on the `netdev FAQ`_ about further details. + +During those two weeks of merge window, we might ask you to resend +your patch series once bpf-next is open again. Once Linus released +a ``v*-rc1`` after the merge window, we continue processing of bpf-next. + +For non-subscribers to kernel mailing lists, there is also a status +page run by David S. Miller on net-next that provides guidance: + + http://vger.kernel.org/~davem/net-next.html + +Q: Verifier changes and test cases +---------------------------------- +Q: I made a BPF verifier change, do I need to add test cases for +BPF kernel selftests_? + +A: If the patch has changes to the behavior of the verifier, then yes, +it is absolutely necessary to add test cases to the BPF kernel +selftests_ suite. If they are not present and we think they are +needed, then we might ask for them before accepting any changes. + +In particular, test_verifier.c is tracking a high number of BPF test +cases, including a lot of corner cases that LLVM BPF back end may +generate out of the restricted C code. Thus, adding test cases is +absolutely crucial to make sure future changes do not accidentally +affect prior use-cases. Thus, treat those test cases as: verifier +behavior that is not tracked in test_verifier.c could potentially +be subject to change. + +Q: samples/bpf preference vs selftests? +--------------------------------------- +Q: When should I add code to `samples/bpf/`_ and when to BPF kernel +selftests_ ? + +A: In general, we prefer additions to BPF kernel selftests_ rather than +`samples/bpf/`_. The rationale is very simple: kernel selftests are +regularly run by various bots to test for kernel regressions. + +The more test cases we add to BPF selftests, the better the coverage +and the less likely it is that those could accidentally break. It is +not that BPF kernel selftests cannot demo how a specific feature can +be used. + +That said, `samples/bpf/`_ may be a good place for people to get started, +so it might be advisable that simple demos of features could go into +`samples/bpf/`_, but advanced functional and corner-case testing rather +into kernel selftests. + +If your sample looks like a test case, then go for BPF kernel selftests +instead! + +Q: When should I add code to the bpftool? +----------------------------------------- +A: The main purpose of bpftool (under tools/bpf/bpftool/) is to provide +a central user space tool for debugging and introspection of BPF programs +and maps that are active in the kernel. If UAPI changes related to BPF +enable for dumping additional information of programs or maps, then +bpftool should be extended as well to support dumping them. + +Q: When should I add code to iproute2's BPF loader? +--------------------------------------------------- +A: For UAPI changes related to the XDP or tc layer (e.g. ``cls_bpf``), +the convention is that those control-path related changes are added to +iproute2's BPF loader as well from user space side. This is not only +useful to have UAPI changes properly designed to be usable, but also +to make those changes available to a wider user base of major +downstream distributions. + +Q: Do you accept patches as well for iproute2's BPF loader? +----------------------------------------------------------- +A: Patches for the iproute2's BPF loader have to be sent to: + + netdev@vger.kernel.org + +While those patches are not processed by the BPF kernel maintainers, +please keep them in Cc as well, so they can be reviewed. + +The official git repository for iproute2 is run by Stephen Hemminger +and can be found at: + + https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/ + +The patches need to have a subject prefix of '``[PATCH iproute2 +master]``' or '``[PATCH iproute2 net-next]``'. '``master``' or +'``net-next``' describes the target branch where the patch should be +applied to. Meaning, if kernel changes went into the net-next kernel +tree, then the related iproute2 changes need to go into the iproute2 +net-next branch, otherwise they can be targeted at master branch. The +iproute2 net-next branch will get merged into the master branch after +the current iproute2 version from master has been released. + +Like BPF, the patches end up in patchwork under the netdev project and +are delegated to 'shemminger' for further processing: + + http://patchwork.ozlabs.org/project/netdev/list/?delegate=389 + +Q: What is the minimum requirement before I submit my BPF patches? +------------------------------------------------------------------ +A: When submitting patches, always take the time and properly test your +patches *prior* to submission. Never rush them! If maintainers find +that your patches have not been properly tested, it is a good way to +get them grumpy. Testing patch submissions is a hard requirement! + +Note, fixes that go to bpf tree *must* have a ``Fixes:`` tag included. +The same applies to fixes that target bpf-next, where the affected +commit is in net-next (or in some cases bpf-next). The ``Fixes:`` tag is +crucial in order to identify follow-up commits and tremendously helps +for people having to do backporting, so it is a must have! + +We also don't accept patches with an empty commit message. Take your +time and properly write up a high quality commit message, it is +essential! + +Think about it this way: other developers looking at your code a month +from now need to understand *why* a certain change has been done that +way, and whether there have been flaws in the analysis or assumptions +that the original author did. Thus providing a proper rationale and +describing the use-case for the changes is a must. + +Patch submissions with >1 patch must have a cover letter which includes +a high level description of the series. This high level summary will +then be placed into the merge commit by the BPF maintainers such that +it is also accessible from the git log for future reference. + +Q: Features changing BPF JIT and/or LLVM +---------------------------------------- +Q: What do I need to consider when adding a new instruction or feature +that would require BPF JIT and/or LLVM integration as well? + +A: We try hard to keep all BPF JITs up to date such that the same user +experience can be guaranteed when running BPF programs on different +architectures without having the program punt to the less efficient +interpreter in case the in-kernel BPF JIT is enabled. + +If you are unable to implement or test the required JIT changes for +certain architectures, please work together with the related BPF JIT +developers in order to get the feature implemented in a timely manner. +Please refer to the git log (``arch/*/net/``) to locate the necessary +people for helping out. + +Also always make sure to add BPF test cases (e.g. test_bpf.c and +test_verifier.c) for new instructions, so that they can receive +broad test coverage and help run-time testing the various BPF JITs. + +In case of new BPF instructions, once the changes have been accepted +into the Linux kernel, please implement support into LLVM's BPF back +end. See LLVM_ section below for further information. + +Stable submission +================= + +Q: I need a specific BPF commit in stable kernels. What should I do? +-------------------------------------------------------------------- +A: In case you need a specific fix in stable kernels, first check whether +the commit has already been applied in the related ``linux-*.y`` branches: + + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/ + +If not the case, then drop an email to the BPF maintainers with the +netdev kernel mailing list in Cc and ask for the fix to be queued up: + + netdev@vger.kernel.org + +The process in general is the same as on netdev itself, see also the +`netdev FAQ`_ document. + +Q: Do you also backport to kernels not currently maintained as stable? +---------------------------------------------------------------------- +A: No. If you need a specific BPF commit in kernels that are currently not +maintained by the stable maintainers, then you are on your own. + +The current stable and longterm stable kernels are all listed here: + + https://www.kernel.org/ + +Q: The BPF patch I am about to submit needs to go to stable as well +------------------------------------------------------------------- +What should I do? + +A: The same rules apply as with netdev patch submissions in general, see +`netdev FAQ`_ under: + + `Documentation/networking/netdev-FAQ.txt`_ + +Never add "``Cc: stable@vger.kernel.org``" to the patch description, but +ask the BPF maintainers to queue the patches instead. This can be done +with a note, for example, under the ``---`` part of the patch which does +not go into the git log. Alternatively, this can be done as a simple +request by mail instead. + +Q: Queue stable patches +----------------------- +Q: Where do I find currently queued BPF patches that will be submitted +to stable? + +A: Once patches that fix critical bugs got applied into the bpf tree, they +are queued up for stable submission under: + + http://patchwork.ozlabs.org/bundle/bpf/stable/?state=* + +They will be on hold there at minimum until the related commit made its +way into the mainline kernel tree. + +After having been under broader exposure, the queued patches will be +submitted by the BPF maintainers to the stable maintainers. + +Testing patches +=============== + +Q: How to run BPF selftests +--------------------------- +A: After you have booted into the newly compiled kernel, navigate to +the BPF selftests_ suite in order to test BPF functionality (current +working directory points to the root of the cloned git tree):: + + $ cd tools/testing/selftests/bpf/ + $ make + +To run the verifier tests:: + + $ sudo ./test_verifier + +The verifier tests print out all the current checks being +performed. The summary at the end of running all tests will dump +information of test successes and failures:: + + Summary: 418 PASSED, 0 FAILED + +In order to run through all BPF selftests, the following command is +needed:: + + $ sudo make run_tests + +See the kernels selftest `Documentation/dev-tools/kselftest.rst`_ +document for further documentation. + +Q: Which BPF kernel selftests version should I run my kernel against? +--------------------------------------------------------------------- +A: If you run a kernel ``xyz``, then always run the BPF kernel selftests +from that kernel ``xyz`` as well. Do not expect that the BPF selftest +from the latest mainline tree will pass all the time. + +In particular, test_bpf.c and test_verifier.c have a large number of +test cases and are constantly updated with new BPF test sequences, or +existing ones are adapted to verifier changes e.g. due to verifier +becoming smarter and being able to better track certain things. + +LLVM +==== + +Q: Where do I find LLVM with BPF support? +----------------------------------------- +A: The BPF back end for LLVM is upstream in LLVM since version 3.7.1. + +All major distributions these days ship LLVM with BPF back end enabled, +so for the majority of use-cases it is not required to compile LLVM by +hand anymore, just install the distribution provided package. + +LLVM's static compiler lists the supported targets through +``llc --version``, make sure BPF targets are listed. Example:: + + $ llc --version + LLVM (http://llvm.org/): + LLVM version 6.0.0svn + Optimized build. + Default target: x86_64-unknown-linux-gnu + Host CPU: skylake + + Registered Targets: + bpf - BPF (host endian) + bpfeb - BPF (big endian) + bpfel - BPF (little endian) + x86 - 32-bit X86: Pentium-Pro and above + x86-64 - 64-bit X86: EM64T and AMD64 + +For developers in order to utilize the latest features added to LLVM's +BPF back end, it is advisable to run the latest LLVM releases. Support +for new BPF kernel features such as additions to the BPF instruction +set are often developed together. + +All LLVM releases can be found at: http://releases.llvm.org/ + +Q: Got it, so how do I build LLVM manually anyway? +-------------------------------------------------- +A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have +that set up, proceed with building the latest LLVM and clang version +from the git repositories:: + + $ git clone http://llvm.org/git/llvm.git + $ cd llvm/tools + $ git clone --depth 1 http://llvm.org/git/clang.git + $ cd ..; mkdir build; cd build + $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ + -DBUILD_SHARED_LIBS=OFF \ + -DCMAKE_BUILD_TYPE=Release \ + -DLLVM_BUILD_RUNTIME=OFF + $ make -j $(getconf _NPROCESSORS_ONLN) + +The built binaries can then be found in the build/bin/ directory, where +you can point the PATH variable to. + +Q: Reporting LLVM BPF issues +---------------------------- +Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code +generation back end or about LLVM generated code that the verifier +refuses to accept? + +A: Yes, please do! + +LLVM's BPF back end is a key piece of the whole BPF +infrastructure and it ties deeply into verification of programs from the +kernel side. Therefore, any issues on either side need to be investigated +and fixed whenever necessary. + +Therefore, please make sure to bring them up at netdev kernel mailing +list and Cc BPF maintainers for LLVM and kernel bits: + +* Yonghong Song +* Alexei Starovoitov +* Daniel Borkmann + +LLVM also has an issue tracker where BPF related bugs can be found: + + https://bugs.llvm.org/buglist.cgi?quicksearch=bpf + +However, it is better to reach out through mailing lists with having +maintainers in Cc. + +Q: New BPF instruction for kernel and LLVM +------------------------------------------ +Q: I have added a new BPF instruction to the kernel, how can I integrate +it into LLVM? + +A: LLVM has a ``-mcpu`` selector for the BPF back end in order to allow +the selection of BPF instruction set extensions. By default the +``generic`` processor target is used, which is the base instruction set +(v1) of BPF. + +LLVM has an option to select ``-mcpu=probe`` where it will probe the host +kernel for supported BPF instruction set extensions and selects the +optimal set automatically. + +For cross-compilation, a specific version can be select manually as well :: + + $ llc -march bpf -mcpu=help + Available CPUs for this target: + + generic - Select the generic processor. + probe - Select the probe processor. + v1 - Select the v1 processor. + v2 - Select the v2 processor. + [...] + +Newly added BPF instructions to the Linux kernel need to follow the same +scheme, bump the instruction set version and implement probing for the +extensions such that ``-mcpu=probe`` users can benefit from the +optimization transparently when upgrading their kernels. + +If you are unable to implement support for the newly added BPF instruction +please reach out to BPF developers for help. + +By the way, the BPF kernel selftests run with ``-mcpu=probe`` for better +test coverage. + +Q: clang flag for target bpf? +----------------------------- +Q: In some cases clang flag ``-target bpf`` is used but in other cases the +default clang target, which matches the underlying architecture, is used. +What is the difference and when I should use which? + +A: Although LLVM IR generation and optimization try to stay architecture +independent, ``-target `` still has some impact on generated code: + +- BPF program may recursively include header file(s) with file scope + inline assembly codes. The default target can handle this well, + while ``bpf`` target may fail if bpf backend assembler does not + understand these assembly codes, which is true in most cases. + +- When compiled without ``-g``, additional elf sections, e.g., + .eh_frame and .rela.eh_frame, may be present in the object file + with default target, but not with ``bpf`` target. + +- The default target may turn a C switch statement into a switch table + lookup and jump operation. Since the switch table is placed + in the global readonly section, the bpf program will fail to load. + The bpf target does not support switch table optimization. + The clang option ``-fno-jump-tables`` can be used to disable + switch table generation. + +- For clang ``-target bpf``, it is guaranteed that pointer or long / + unsigned long types will always have a width of 64 bit, no matter + whether underlying clang binary or default target (or kernel) is + 32 bit. However, when native clang target is used, then it will + compile these types based on the underlying architecture's conventions, + meaning in case of 32 bit architecture, pointer or long / unsigned + long types e.g. in BPF context structure will have width of 32 bit + while the BPF LLVM back end still operates in 64 bit. The native + target is mostly needed in tracing for the case of walking ``pt_regs`` + or other kernel structures where CPU's register width matters. + Otherwise, ``clang -target bpf`` is generally recommended. + +You should use default target when: + +- Your program includes a header file, e.g., ptrace.h, which eventually + pulls in some header files containing file scope host assembly codes. + +- You can add ``-fno-jump-tables`` to work around the switch table issue. + +Otherwise, you can use ``bpf`` target. Additionally, you *must* use bpf target +when: + +- Your program uses data structures with pointer or long / unsigned long + types that interface with BPF helpers or context data structures. Access + into these structures is verified by the BPF verifier and may result + in verification failures if the native architecture is not aligned with + the BPF architecture, e.g. 64-bit. An example of this is + BPF_PROG_TYPE_SK_MSG require ``-target bpf`` + + +.. Links +.. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/ +.. _MAINTAINERS: ../../MAINTAINERS +.. _Documentation/networking/netdev-FAQ.txt: ../networking/netdev-FAQ.txt +.. _netdev FAQ: ../networking/netdev-FAQ.txt +.. _samples/bpf/: ../../samples/bpf/ +.. _selftests: ../../tools/testing/selftests/bpf/ +.. _Documentation/dev-tools/kselftest.rst: + https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html + +Happy BPF hacking! diff --git a/Documentation/bpf/bpf_devel_QA.txt b/Documentation/bpf/bpf_devel_QA.txt deleted file mode 100644 index da57601153a0402ebcdda8d0bafbd420bf7b72bb..0000000000000000000000000000000000000000 --- a/Documentation/bpf/bpf_devel_QA.txt +++ /dev/null @@ -1,570 +0,0 @@ -This document provides information for the BPF subsystem about various -workflows related to reporting bugs, submitting patches, and queueing -patches for stable kernels. - -For general information about submitting patches, please refer to -Documentation/process/. This document only describes additional specifics -related to BPF. - -Reporting bugs: ---------------- - -Q: How do I report bugs for BPF kernel code? - -A: Since all BPF kernel development as well as bpftool and iproute2 BPF - loader development happens through the netdev kernel mailing list, - please report any found issues around BPF to the following mailing - list: - - netdev@vger.kernel.org - - This may also include issues related to XDP, BPF tracing, etc. - - Given netdev has a high volume of traffic, please also add the BPF - maintainers to Cc (from kernel MAINTAINERS file): - - Alexei Starovoitov - Daniel Borkmann - - In case a buggy commit has already been identified, make sure to keep - the actual commit authors in Cc as well for the report. They can - typically be identified through the kernel's git tree. - - Please do *not* report BPF issues to bugzilla.kernel.org since it - is a guarantee that the reported issue will be overlooked. - -Submitting patches: -------------------- - -Q: To which mailing list do I need to submit my BPF patches? - -A: Please submit your BPF patches to the netdev kernel mailing list: - - netdev@vger.kernel.org - - Historically, BPF came out of networking and has always been maintained - by the kernel networking community. Although these days BPF touches - many other subsystems as well, the patches are still routed mainly - through the networking community. - - In case your patch has changes in various different subsystems (e.g. - tracing, security, etc), make sure to Cc the related kernel mailing - lists and maintainers from there as well, so they are able to review - the changes and provide their Acked-by's to the patches. - -Q: Where can I find patches currently under discussion for BPF subsystem? - -A: All patches that are Cc'ed to netdev are queued for review under netdev - patchwork project: - - http://patchwork.ozlabs.org/project/netdev/list/ - - Those patches which target BPF, are assigned to a 'bpf' delegate for - further processing from BPF maintainers. The current queue with - patches under review can be found at: - - https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 - - Once the patches have been reviewed by the BPF community as a whole - and approved by the BPF maintainers, their status in patchwork will be - changed to 'Accepted' and the submitter will be notified by mail. This - means that the patches look good from a BPF perspective and have been - applied to one of the two BPF kernel trees. - - In case feedback from the community requires a respin of the patches, - their status in patchwork will be set to 'Changes Requested', and purged - from the current review queue. Likewise for cases where patches would - get rejected or are not applicable to the BPF trees (but assigned to - the 'bpf' delegate). - -Q: How do the changes make their way into Linux? - -A: There are two BPF kernel trees (git repositories). Once patches have - been accepted by the BPF maintainers, they will be applied to one - of the two BPF trees: - - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/ - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ - - The bpf tree itself is for fixes only, whereas bpf-next for features, - cleanups or other kind of improvements ("next-like" content). This is - analogous to net and net-next trees for networking. Both bpf and - bpf-next will only have a master branch in order to simplify against - which branch patches should get rebased to. - - Accumulated BPF patches in the bpf tree will regularly get pulled - into the net kernel tree. Likewise, accumulated BPF patches accepted - into the bpf-next tree will make their way into net-next tree. net and - net-next are both run by David S. Miller. From there, they will go - into the kernel mainline tree run by Linus Torvalds. To read up on the - process of net and net-next being merged into the mainline tree, see - the netdev FAQ under: - - Documentation/networking/netdev-FAQ.txt - - Occasionally, to prevent merge conflicts, we might send pull requests - to other trees (e.g. tracing) with a small subset of the patches, but - net and net-next are always the main trees targeted for integration. - - The pull requests will contain a high-level summary of the accumulated - patches and can be searched on netdev kernel mailing list through the - following subject lines (yyyy-mm-dd is the date of the pull request): - - pull-request: bpf yyyy-mm-dd - pull-request: bpf-next yyyy-mm-dd - -Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be - applied to? - -A: The process is the very same as described in the netdev FAQ, so - please read up on it. The subject line must indicate whether the - patch is a fix or rather "next-like" content in order to let the - maintainers know whether it is targeted at bpf or bpf-next. - - For fixes eventually landing in bpf -> net tree, the subject must - look like: - - git format-patch --subject-prefix='PATCH bpf' start..finish - - For features/improvements/etc that should eventually land in - bpf-next -> net-next, the subject must look like: - - git format-patch --subject-prefix='PATCH bpf-next' start..finish - - If unsure whether the patch or patch series should go into bpf - or net directly, or bpf-next or net-next directly, it is not a - problem either if the subject line says net or net-next as target. - It is eventually up to the maintainers to do the delegation of - the patches. - - If it is clear that patches should go into bpf or bpf-next tree, - please make sure to rebase the patches against those trees in - order to reduce potential conflicts. - - In case the patch or patch series has to be reworked and sent out - again in a second or later revision, it is also required to add a - version number (v2, v3, ...) into the subject prefix: - - git format-patch --subject-prefix='PATCH net-next v2' start..finish - - When changes have been requested to the patch series, always send the - whole patch series again with the feedback incorporated (never send - individual diffs on top of the old series). - -Q: What does it mean when a patch gets applied to bpf or bpf-next tree? - -A: It means that the patch looks good for mainline inclusion from - a BPF point of view. - - Be aware that this is not a final verdict that the patch will - automatically get accepted into net or net-next trees eventually: - - On the netdev kernel mailing list reviews can come in at any point - in time. If discussions around a patch conclude that they cannot - get included as-is, we will either apply a follow-up fix or drop - them from the trees entirely. Therefore, we also reserve to rebase - the trees when deemed necessary. After all, the purpose of the tree - is to i) accumulate and stage BPF patches for integration into trees - like net and net-next, and ii) run extensive BPF test suite and - workloads on the patches before they make their way any further. - - Once the BPF pull request was accepted by David S. Miller, then - the patches end up in net or net-next tree, respectively, and - make their way from there further into mainline. Again, see the - netdev FAQ for additional information e.g. on how often they are - merged to mainline. - -Q: How long do I need to wait for feedback on my BPF patches? - -A: We try to keep the latency low. The usual time to feedback will - be around 2 or 3 business days. It may vary depending on the - complexity of changes and current patch load. - -Q: How often do you send pull requests to major kernel trees like - net or net-next? - -A: Pull requests will be sent out rather often in order to not - accumulate too many patches in bpf or bpf-next. - - As a rule of thumb, expect pull requests for each tree regularly - at the end of the week. In some cases pull requests could additionally - come also in the middle of the week depending on the current patch - load or urgency. - -Q: Are patches applied to bpf-next when the merge window is open? - -A: For the time when the merge window is open, bpf-next will not be - processed. This is roughly analogous to net-next patch processing, - so feel free to read up on the netdev FAQ about further details. - - During those two weeks of merge window, we might ask you to resend - your patch series once bpf-next is open again. Once Linus released - a v*-rc1 after the merge window, we continue processing of bpf-next. - - For non-subscribers to kernel mailing lists, there is also a status - page run by David S. Miller on net-next that provides guidance: - - http://vger.kernel.org/~davem/net-next.html - -Q: I made a BPF verifier change, do I need to add test cases for - BPF kernel selftests? - -A: If the patch has changes to the behavior of the verifier, then yes, - it is absolutely necessary to add test cases to the BPF kernel - selftests suite. If they are not present and we think they are - needed, then we might ask for them before accepting any changes. - - In particular, test_verifier.c is tracking a high number of BPF test - cases, including a lot of corner cases that LLVM BPF back end may - generate out of the restricted C code. Thus, adding test cases is - absolutely crucial to make sure future changes do not accidentally - affect prior use-cases. Thus, treat those test cases as: verifier - behavior that is not tracked in test_verifier.c could potentially - be subject to change. - -Q: When should I add code to samples/bpf/ and when to BPF kernel - selftests? - -A: In general, we prefer additions to BPF kernel selftests rather than - samples/bpf/. The rationale is very simple: kernel selftests are - regularly run by various bots to test for kernel regressions. - - The more test cases we add to BPF selftests, the better the coverage - and the less likely it is that those could accidentally break. It is - not that BPF kernel selftests cannot demo how a specific feature can - be used. - - That said, samples/bpf/ may be a good place for people to get started, - so it might be advisable that simple demos of features could go into - samples/bpf/, but advanced functional and corner-case testing rather - into kernel selftests. - - If your sample looks like a test case, then go for BPF kernel selftests - instead! - -Q: When should I add code to the bpftool? - -A: The main purpose of bpftool (under tools/bpf/bpftool/) is to provide - a central user space tool for debugging and introspection of BPF programs - and maps that are active in the kernel. If UAPI changes related to BPF - enable for dumping additional information of programs or maps, then - bpftool should be extended as well to support dumping them. - -Q: When should I add code to iproute2's BPF loader? - -A: For UAPI changes related to the XDP or tc layer (e.g. cls_bpf), the - convention is that those control-path related changes are added to - iproute2's BPF loader as well from user space side. This is not only - useful to have UAPI changes properly designed to be usable, but also - to make those changes available to a wider user base of major - downstream distributions. - -Q: Do you accept patches as well for iproute2's BPF loader? - -A: Patches for the iproute2's BPF loader have to be sent to: - - netdev@vger.kernel.org - - While those patches are not processed by the BPF kernel maintainers, - please keep them in Cc as well, so they can be reviewed. - - The official git repository for iproute2 is run by Stephen Hemminger - and can be found at: - - https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/ - - The patches need to have a subject prefix of '[PATCH iproute2 master]' - or '[PATCH iproute2 net-next]'. 'master' or 'net-next' describes the - target branch where the patch should be applied to. Meaning, if kernel - changes went into the net-next kernel tree, then the related iproute2 - changes need to go into the iproute2 net-next branch, otherwise they - can be targeted at master branch. The iproute2 net-next branch will get - merged into the master branch after the current iproute2 version from - master has been released. - - Like BPF, the patches end up in patchwork under the netdev project and - are delegated to 'shemminger' for further processing: - - http://patchwork.ozlabs.org/project/netdev/list/?delegate=389 - -Q: What is the minimum requirement before I submit my BPF patches? - -A: When submitting patches, always take the time and properly test your - patches *prior* to submission. Never rush them! If maintainers find - that your patches have not been properly tested, it is a good way to - get them grumpy. Testing patch submissions is a hard requirement! - - Note, fixes that go to bpf tree *must* have a Fixes: tag included. The - same applies to fixes that target bpf-next, where the affected commit - is in net-next (or in some cases bpf-next). The Fixes: tag is crucial - in order to identify follow-up commits and tremendously helps for people - having to do backporting, so it is a must have! - - We also don't accept patches with an empty commit message. Take your - time and properly write up a high quality commit message, it is - essential! - - Think about it this way: other developers looking at your code a month - from now need to understand *why* a certain change has been done that - way, and whether there have been flaws in the analysis or assumptions - that the original author did. Thus providing a proper rationale and - describing the use-case for the changes is a must. - - Patch submissions with >1 patch must have a cover letter which includes - a high level description of the series. This high level summary will - then be placed into the merge commit by the BPF maintainers such that - it is also accessible from the git log for future reference. - -Q: What do I need to consider when adding a new instruction or feature - that would require BPF JIT and/or LLVM integration as well? - -A: We try hard to keep all BPF JITs up to date such that the same user - experience can be guaranteed when running BPF programs on different - architectures without having the program punt to the less efficient - interpreter in case the in-kernel BPF JIT is enabled. - - If you are unable to implement or test the required JIT changes for - certain architectures, please work together with the related BPF JIT - developers in order to get the feature implemented in a timely manner. - Please refer to the git log (arch/*/net/) to locate the necessary - people for helping out. - - Also always make sure to add BPF test cases (e.g. test_bpf.c and - test_verifier.c) for new instructions, so that they can receive - broad test coverage and help run-time testing the various BPF JITs. - - In case of new BPF instructions, once the changes have been accepted - into the Linux kernel, please implement support into LLVM's BPF back - end. See LLVM section below for further information. - -Stable submission: ------------------- - -Q: I need a specific BPF commit in stable kernels. What should I do? - -A: In case you need a specific fix in stable kernels, first check whether - the commit has already been applied in the related linux-*.y branches: - - https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/ - - If not the case, then drop an email to the BPF maintainers with the - netdev kernel mailing list in Cc and ask for the fix to be queued up: - - netdev@vger.kernel.org - - The process in general is the same as on netdev itself, see also the - netdev FAQ document. - -Q: Do you also backport to kernels not currently maintained as stable? - -A: No. If you need a specific BPF commit in kernels that are currently not - maintained by the stable maintainers, then you are on your own. - - The current stable and longterm stable kernels are all listed here: - - https://www.kernel.org/ - -Q: The BPF patch I am about to submit needs to go to stable as well. What - should I do? - -A: The same rules apply as with netdev patch submissions in general, see - netdev FAQ under: - - Documentation/networking/netdev-FAQ.txt - - Never add "Cc: stable@vger.kernel.org" to the patch description, but - ask the BPF maintainers to queue the patches instead. This can be done - with a note, for example, under the "---" part of the patch which does - not go into the git log. Alternatively, this can be done as a simple - request by mail instead. - -Q: Where do I find currently queued BPF patches that will be submitted - to stable? - -A: Once patches that fix critical bugs got applied into the bpf tree, they - are queued up for stable submission under: - - http://patchwork.ozlabs.org/bundle/bpf/stable/?state=* - - They will be on hold there at minimum until the related commit made its - way into the mainline kernel tree. - - After having been under broader exposure, the queued patches will be - submitted by the BPF maintainers to the stable maintainers. - -Testing patches: ----------------- - -Q: Which BPF kernel selftests version should I run my kernel against? - -A: If you run a kernel xyz, then always run the BPF kernel selftests from - that kernel xyz as well. Do not expect that the BPF selftest from the - latest mainline tree will pass all the time. - - In particular, test_bpf.c and test_verifier.c have a large number of - test cases and are constantly updated with new BPF test sequences, or - existing ones are adapted to verifier changes e.g. due to verifier - becoming smarter and being able to better track certain things. - -LLVM: ------ - -Q: Where do I find LLVM with BPF support? - -A: The BPF back end for LLVM is upstream in LLVM since version 3.7.1. - - All major distributions these days ship LLVM with BPF back end enabled, - so for the majority of use-cases it is not required to compile LLVM by - hand anymore, just install the distribution provided package. - - LLVM's static compiler lists the supported targets through 'llc --version', - make sure BPF targets are listed. Example: - - $ llc --version - LLVM (http://llvm.org/): - LLVM version 6.0.0svn - Optimized build. - Default target: x86_64-unknown-linux-gnu - Host CPU: skylake - - Registered Targets: - bpf - BPF (host endian) - bpfeb - BPF (big endian) - bpfel - BPF (little endian) - x86 - 32-bit X86: Pentium-Pro and above - x86-64 - 64-bit X86: EM64T and AMD64 - - For developers in order to utilize the latest features added to LLVM's - BPF back end, it is advisable to run the latest LLVM releases. Support - for new BPF kernel features such as additions to the BPF instruction - set are often developed together. - - All LLVM releases can be found at: http://releases.llvm.org/ - -Q: Got it, so how do I build LLVM manually anyway? - -A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have - that set up, proceed with building the latest LLVM and clang version - from the git repositories: - - $ git clone http://llvm.org/git/llvm.git - $ cd llvm/tools - $ git clone --depth 1 http://llvm.org/git/clang.git - $ cd ..; mkdir build; cd build - $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ - -DBUILD_SHARED_LIBS=OFF \ - -DCMAKE_BUILD_TYPE=Release \ - -DLLVM_BUILD_RUNTIME=OFF - $ make -j $(getconf _NPROCESSORS_ONLN) - - The built binaries can then be found in the build/bin/ directory, where - you can point the PATH variable to. - -Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code - generation back end or about LLVM generated code that the verifier - refuses to accept? - -A: Yes, please do! LLVM's BPF back end is a key piece of the whole BPF - infrastructure and it ties deeply into verification of programs from the - kernel side. Therefore, any issues on either side need to be investigated - and fixed whenever necessary. - - Therefore, please make sure to bring them up at netdev kernel mailing - list and Cc BPF maintainers for LLVM and kernel bits: - - Yonghong Song - Alexei Starovoitov - Daniel Borkmann - - LLVM also has an issue tracker where BPF related bugs can be found: - - https://bugs.llvm.org/buglist.cgi?quicksearch=bpf - - However, it is better to reach out through mailing lists with having - maintainers in Cc. - -Q: I have added a new BPF instruction to the kernel, how can I integrate - it into LLVM? - -A: LLVM has a -mcpu selector for the BPF back end in order to allow the - selection of BPF instruction set extensions. By default the 'generic' - processor target is used, which is the base instruction set (v1) of BPF. - - LLVM has an option to select -mcpu=probe where it will probe the host - kernel for supported BPF instruction set extensions and selects the - optimal set automatically. - - For cross-compilation, a specific version can be select manually as well. - - $ llc -march bpf -mcpu=help - Available CPUs for this target: - - generic - Select the generic processor. - probe - Select the probe processor. - v1 - Select the v1 processor. - v2 - Select the v2 processor. - [...] - - Newly added BPF instructions to the Linux kernel need to follow the same - scheme, bump the instruction set version and implement probing for the - extensions such that -mcpu=probe users can benefit from the optimization - transparently when upgrading their kernels. - - If you are unable to implement support for the newly added BPF instruction - please reach out to BPF developers for help. - - By the way, the BPF kernel selftests run with -mcpu=probe for better - test coverage. - -Q: In some cases clang flag "-target bpf" is used but in other cases the - default clang target, which matches the underlying architecture, is used. - What is the difference and when I should use which? - -A: Although LLVM IR generation and optimization try to stay architecture - independent, "-target " still has some impact on generated code: - - - BPF program may recursively include header file(s) with file scope - inline assembly codes. The default target can handle this well, - while bpf target may fail if bpf backend assembler does not - understand these assembly codes, which is true in most cases. - - - When compiled without -g, additional elf sections, e.g., - .eh_frame and .rela.eh_frame, may be present in the object file - with default target, but not with bpf target. - - - The default target may turn a C switch statement into a switch table - lookup and jump operation. Since the switch table is placed - in the global readonly section, the bpf program will fail to load. - The bpf target does not support switch table optimization. - The clang option "-fno-jump-tables" can be used to disable - switch table generation. - - - For clang -target bpf, it is guaranteed that pointer or long / - unsigned long types will always have a width of 64 bit, no matter - whether underlying clang binary or default target (or kernel) is - 32 bit. However, when native clang target is used, then it will - compile these types based on the underlying architecture's conventions, - meaning in case of 32 bit architecture, pointer or long / unsigned - long types e.g. in BPF context structure will have width of 32 bit - while the BPF LLVM back end still operates in 64 bit. The native - target is mostly needed in tracing for the case of walking pt_regs - or other kernel structures where CPU's register width matters. - Otherwise, clang -target bpf is generally recommended. - - You should use default target when: - - - Your program includes a header file, e.g., ptrace.h, which eventually - pulls in some header files containing file scope host assembly codes. - - You can add "-fno-jump-tables" to work around the switch table issue. - - Otherwise, you can use bpf target. Additionally, you _must_ use bpf target - when: - - - Your program uses data structures with pointer or long / unsigned long - types that interface with BPF helpers or context data structures. Access - into these structures is verified by the BPF verifier and may result - in verification failures if the native architecture is not aligned with - the BPF architecture, e.g. 64-bit. An example of this is - BPF_PROG_TYPE_SK_MSG require '-target bpf' - -Happy BPF hacking! diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt deleted file mode 100644 index 74cdeaed9f7afb4f0b514f1213b38b2eda106f56..0000000000000000000000000000000000000000 --- a/Documentation/cgroup-v2.txt +++ /dev/null @@ -1,1998 +0,0 @@ -================ -Control Group v2 -================ - -:Date: October, 2015 -:Author: Tejun Heo - -This is the authoritative documentation on the design, interface and -conventions of cgroup v2. It describes all userland-visible aspects -of cgroup including core and specific controller behaviors. All -future changes must be reflected in this document. Documentation for -v1 is available under Documentation/cgroup-v1/. - -.. CONTENTS - - 1. Introduction - 1-1. Terminology - 1-2. What is cgroup? - 2. Basic Operations - 2-1. Mounting - 2-2. Organizing Processes and Threads - 2-2-1. Processes - 2-2-2. Threads - 2-3. [Un]populated Notification - 2-4. Controlling Controllers - 2-4-1. Enabling and Disabling - 2-4-2. Top-down Constraint - 2-4-3. No Internal Process Constraint - 2-5. Delegation - 2-5-1. Model of Delegation - 2-5-2. Delegation Containment - 2-6. Guidelines - 2-6-1. Organize Once and Control - 2-6-2. Avoid Name Collisions - 3. Resource Distribution Models - 3-1. Weights - 3-2. Limits - 3-3. Protections - 3-4. Allocations - 4. Interface Files - 4-1. Format - 4-2. Conventions - 4-3. Core Interface Files - 5. Controllers - 5-1. CPU - 5-1-1. CPU Interface Files - 5-2. Memory - 5-2-1. Memory Interface Files - 5-2-2. Usage Guidelines - 5-2-3. Memory Ownership - 5-3. IO - 5-3-1. IO Interface Files - 5-3-2. Writeback - 5-4. PID - 5-4-1. PID Interface Files - 5-5. Device - 5-6. RDMA - 5-6-1. RDMA Interface Files - 5-7. Misc - 5-7-1. perf_event - 5-N. Non-normative information - 5-N-1. CPU controller root cgroup process behaviour - 5-N-2. IO controller root cgroup process behaviour - 6. Namespace - 6-1. Basics - 6-2. The Root and Views - 6-3. Migration and setns(2) - 6-4. Interaction with Other Namespaces - P. Information on Kernel Programming - P-1. Filesystem Support for Writeback - D. Deprecated v1 Core Features - R. Issues with v1 and Rationales for v2 - R-1. Multiple Hierarchies - R-2. Thread Granularity - R-3. Competition Between Inner Nodes and Threads - R-4. Other Interface Issues - R-5. Controller Issues and Remedies - R-5-1. Memory - - -Introduction -============ - -Terminology ------------ - -"cgroup" stands for "control group" and is never capitalized. The -singular form is used to designate the whole feature and also as a -qualifier as in "cgroup controllers". When explicitly referring to -multiple individual control groups, the plural form "cgroups" is used. - - -What is cgroup? ---------------- - -cgroup is a mechanism to organize processes hierarchically and -distribute system resources along the hierarchy in a controlled and -configurable manner. - -cgroup is largely composed of two parts - the core and controllers. -cgroup core is primarily responsible for hierarchically organizing -processes. A cgroup controller is usually responsible for -distributing a specific type of system resource along the hierarchy -although there are utility controllers which serve purposes other than -resource distribution. - -cgroups form a tree structure and every process in the system belongs -to one and only one cgroup. All threads of a process belong to the -same cgroup. On creation, all processes are put in the cgroup that -the parent process belongs to at the time. A process can be migrated -to another cgroup. Migration of a process doesn't affect already -existing descendant processes. - -Following certain structural constraints, controllers may be enabled or -disabled selectively on a cgroup. All controller behaviors are -hierarchical - if a controller is enabled on a cgroup, it affects all -processes which belong to the cgroups consisting the inclusive -sub-hierarchy of the cgroup. When a controller is enabled on a nested -cgroup, it always restricts the resource distribution further. The -restrictions set closer to the root in the hierarchy can not be -overridden from further away. - - -Basic Operations -================ - -Mounting --------- - -Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2 -hierarchy can be mounted with the following mount command:: - - # mount -t cgroup2 none $MOUNT_POINT - -cgroup2 filesystem has the magic number 0x63677270 ("cgrp"). All -controllers which support v2 and are not bound to a v1 hierarchy are -automatically bound to the v2 hierarchy and show up at the root. -Controllers which are not in active use in the v2 hierarchy can be -bound to other hierarchies. This allows mixing v2 hierarchy with the -legacy v1 multiple hierarchies in a fully backward compatible way. - -A controller can be moved across hierarchies only after the controller -is no longer referenced in its current hierarchy. Because per-cgroup -controller states are destroyed asynchronously and controllers may -have lingering references, a controller may not show up immediately on -the v2 hierarchy after the final umount of the previous hierarchy. -Similarly, a controller should be fully disabled to be moved out of -the unified hierarchy and it may take some time for the disabled -controller to become available for other hierarchies; furthermore, due -to inter-controller dependencies, other controllers may need to be -disabled too. - -While useful for development and manual configurations, moving -controllers dynamically between the v2 and other hierarchies is -strongly discouraged for production use. It is recommended to decide -the hierarchies and controller associations before starting using the -controllers after system boot. - -During transition to v2, system management software might still -automount the v1 cgroup filesystem and so hijack all controllers -during boot, before manual intervention is possible. To make testing -and experimenting easier, the kernel parameter cgroup_no_v1= allows -disabling controllers in v1 and make them always available in v2. - -cgroup v2 currently supports the following mount options. - - nsdelegate - - Consider cgroup namespaces as delegation boundaries. This - option is system wide and can only be set on mount or modified - through remount from the init namespace. The mount option is - ignored on non-init namespace mounts. Please refer to the - Delegation section for details. - - -Organizing Processes and Threads --------------------------------- - -Processes -~~~~~~~~~ - -Initially, only the root cgroup exists to which all processes belong. -A child cgroup can be created by creating a sub-directory:: - - # mkdir $CGROUP_NAME - -A given cgroup may have multiple child cgroups forming a tree -structure. Each cgroup has a read-writable interface file -"cgroup.procs". When read, it lists the PIDs of all processes which -belong to the cgroup one-per-line. The PIDs are not ordered and the -same PID may show up more than once if the process got moved to -another cgroup and then back or the PID got recycled while reading. - -A process can be migrated into a cgroup by writing its PID to the -target cgroup's "cgroup.procs" file. Only one process can be migrated -on a single write(2) call. If a process is composed of multiple -threads, writing the PID of any thread migrates all threads of the -process. - -When a process forks a child process, the new process is born into the -cgroup that the forking process belongs to at the time of the -operation. After exit, a process stays associated with the cgroup -that it belonged to at the time of exit until it's reaped; however, a -zombie process does not appear in "cgroup.procs" and thus can't be -moved to another cgroup. - -A cgroup which doesn't have any children or live processes can be -destroyed by removing the directory. Note that a cgroup which doesn't -have any children and is associated only with zombie processes is -considered empty and can be removed:: - - # rmdir $CGROUP_NAME - -"/proc/$PID/cgroup" lists a process's cgroup membership. If legacy -cgroup is in use in the system, this file may contain multiple lines, -one for each hierarchy. The entry for cgroup v2 is always in the -format "0::$PATH":: - - # cat /proc/842/cgroup - ... - 0::/test-cgroup/test-cgroup-nested - -If the process becomes a zombie and the cgroup it was associated with -is removed subsequently, " (deleted)" is appended to the path:: - - # cat /proc/842/cgroup - ... - 0::/test-cgroup/test-cgroup-nested (deleted) - - -Threads -~~~~~~~ - -cgroup v2 supports thread granularity for a subset of controllers to -support use cases requiring hierarchical resource distribution across -the threads of a group of processes. By default, all threads of a -process belong to the same cgroup, which also serves as the resource -domain to host resource consumptions which are not specific to a -process or thread. The thread mode allows threads to be spread across -a subtree while still maintaining the common resource domain for them. - -Controllers which support thread mode are called threaded controllers. -The ones which don't are called domain controllers. - -Marking a cgroup threaded makes it join the resource domain of its -parent as a threaded cgroup. The parent may be another threaded -cgroup whose resource domain is further up in the hierarchy. The root -of a threaded subtree, that is, the nearest ancestor which is not -threaded, is called threaded domain or thread root interchangeably and -serves as the resource domain for the entire subtree. - -Inside a threaded subtree, threads of a process can be put in -different cgroups and are not subject to the no internal process -constraint - threaded controllers can be enabled on non-leaf cgroups -whether they have threads in them or not. - -As the threaded domain cgroup hosts all the domain resource -consumptions of the subtree, it is considered to have internal -resource consumptions whether there are processes in it or not and -can't have populated child cgroups which aren't threaded. Because the -root cgroup is not subject to no internal process constraint, it can -serve both as a threaded domain and a parent to domain cgroups. - -The current operation mode or type of the cgroup is shown in the -"cgroup.type" file which indicates whether the cgroup is a normal -domain, a domain which is serving as the domain of a threaded subtree, -or a threaded cgroup. - -On creation, a cgroup is always a domain cgroup and can be made -threaded by writing "threaded" to the "cgroup.type" file. The -operation is single direction:: - - # echo threaded > cgroup.type - -Once threaded, the cgroup can't be made a domain again. To enable the -thread mode, the following conditions must be met. - -- As the cgroup will join the parent's resource domain. The parent - must either be a valid (threaded) domain or a threaded cgroup. - -- When the parent is an unthreaded domain, it must not have any domain - controllers enabled or populated domain children. The root is - exempt from this requirement. - -Topology-wise, a cgroup can be in an invalid state. Please consider -the following topology:: - - A (threaded domain) - B (threaded) - C (domain, just created) - -C is created as a domain but isn't connected to a parent which can -host child domains. C can't be used until it is turned into a -threaded cgroup. "cgroup.type" file will report "domain (invalid)" in -these cases. Operations which fail due to invalid topology use -EOPNOTSUPP as the errno. - -A domain cgroup is turned into a threaded domain when one of its child -cgroup becomes threaded or threaded controllers are enabled in the -"cgroup.subtree_control" file while there are processes in the cgroup. -A threaded domain reverts to a normal domain when the conditions -clear. - -When read, "cgroup.threads" contains the list of the thread IDs of all -threads in the cgroup. Except that the operations are per-thread -instead of per-process, "cgroup.threads" has the same format and -behaves the same way as "cgroup.procs". While "cgroup.threads" can be -written to in any cgroup, as it can only move threads inside the same -threaded domain, its operations are confined inside each threaded -subtree. - -The threaded domain cgroup serves as the resource domain for the whole -subtree, and, while the threads can be scattered across the subtree, -all the processes are considered to be in the threaded domain cgroup. -"cgroup.procs" in a threaded domain cgroup contains the PIDs of all -processes in the subtree and is not readable in the subtree proper. -However, "cgroup.procs" can be written to from anywhere in the subtree -to migrate all threads of the matching process to the cgroup. - -Only threaded controllers can be enabled in a threaded subtree. When -a threaded controller is enabled inside a threaded subtree, it only -accounts for and controls resource consumptions associated with the -threads in the cgroup and its descendants. All consumptions which -aren't tied to a specific thread belong to the threaded domain cgroup. - -Because a threaded subtree is exempt from no internal process -constraint, a threaded controller must be able to handle competition -between threads in a non-leaf cgroup and its child cgroups. Each -threaded controller defines how such competitions are handled. - - -[Un]populated Notification --------------------------- - -Each non-root cgroup has a "cgroup.events" file which contains -"populated" field indicating whether the cgroup's sub-hierarchy has -live processes in it. Its value is 0 if there is no live process in -the cgroup and its descendants; otherwise, 1. poll and [id]notify -events are triggered when the value changes. This can be used, for -example, to start a clean-up operation after all processes of a given -sub-hierarchy have exited. The populated state updates and -notifications are recursive. Consider the following sub-hierarchy -where the numbers in the parentheses represent the numbers of processes -in each cgroup:: - - A(4) - B(0) - C(1) - \ D(0) - -A, B and C's "populated" fields would be 1 while D's 0. After the one -process in C exits, B and C's "populated" fields would flip to "0" and -file modified events will be generated on the "cgroup.events" files of -both cgroups. - - -Controlling Controllers ------------------------ - -Enabling and Disabling -~~~~~~~~~~~~~~~~~~~~~~ - -Each cgroup has a "cgroup.controllers" file which lists all -controllers available for the cgroup to enable:: - - # cat cgroup.controllers - cpu io memory - -No controller is enabled by default. Controllers can be enabled and -disabled by writing to the "cgroup.subtree_control" file:: - - # echo "+cpu +memory -io" > cgroup.subtree_control - -Only controllers which are listed in "cgroup.controllers" can be -enabled. When multiple operations are specified as above, either they -all succeed or fail. If multiple operations on the same controller -are specified, the last one is effective. - -Enabling a controller in a cgroup indicates that the distribution of -the target resource across its immediate children will be controlled. -Consider the following sub-hierarchy. The enabled controllers are -listed in parentheses:: - - A(cpu,memory) - B(memory) - C() - \ D() - -As A has "cpu" and "memory" enabled, A will control the distribution -of CPU cycles and memory to its children, in this case, B. As B has -"memory" enabled but not "CPU", C and D will compete freely on CPU -cycles but their division of memory available to B will be controlled. - -As a controller regulates the distribution of the target resource to -the cgroup's children, enabling it creates the controller's interface -files in the child cgroups. In the above example, enabling "cpu" on B -would create the "cpu." prefixed controller interface files in C and -D. Likewise, disabling "memory" from B would remove the "memory." -prefixed controller interface files from C and D. This means that the -controller interface files - anything which doesn't start with -"cgroup." are owned by the parent rather than the cgroup itself. - - -Top-down Constraint -~~~~~~~~~~~~~~~~~~~ - -Resources are distributed top-down and a cgroup can further distribute -a resource only if the resource has been distributed to it from the -parent. This means that all non-root "cgroup.subtree_control" files -can only contain controllers which are enabled in the parent's -"cgroup.subtree_control" file. A controller can be enabled only if -the parent has the controller enabled and a controller can't be -disabled if one or more children have it enabled. - - -No Internal Process Constraint -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Non-root cgroups can distribute domain resources to their children -only when they don't have any processes of their own. In other words, -only domain cgroups which don't contain any processes can have domain -controllers enabled in their "cgroup.subtree_control" files. - -This guarantees that, when a domain controller is looking at the part -of the hierarchy which has it enabled, processes are always only on -the leaves. This rules out situations where child cgroups compete -against internal processes of the parent. - -The root cgroup is exempt from this restriction. Root contains -processes and anonymous resource consumption which can't be associated -with any other cgroups and requires special treatment from most -controllers. How resource consumption in the root cgroup is governed -is up to each controller (for more information on this topic please -refer to the Non-normative information section in the Controllers -chapter). - -Note that the restriction doesn't get in the way if there is no -enabled controller in the cgroup's "cgroup.subtree_control". This is -important as otherwise it wouldn't be possible to create children of a -populated cgroup. To control resource distribution of a cgroup, the -cgroup must create children and transfer all its processes to the -children before enabling controllers in its "cgroup.subtree_control" -file. - - -Delegation ----------- - -Model of Delegation -~~~~~~~~~~~~~~~~~~~ - -A cgroup can be delegated in two ways. First, to a less privileged -user by granting write access of the directory and its "cgroup.procs", -"cgroup.threads" and "cgroup.subtree_control" files to the user. -Second, if the "nsdelegate" mount option is set, automatically to a -cgroup namespace on namespace creation. - -Because the resource control interface files in a given directory -control the distribution of the parent's resources, the delegatee -shouldn't be allowed to write to them. For the first method, this is -achieved by not granting access to these files. For the second, the -kernel rejects writes to all files other than "cgroup.procs" and -"cgroup.subtree_control" on a namespace root from inside the -namespace. - -The end results are equivalent for both delegation types. Once -delegated, the user can build sub-hierarchy under the directory, -organize processes inside it as it sees fit and further distribute the -resources it received from the parent. The limits and other settings -of all resource controllers are hierarchical and regardless of what -happens in the delegated sub-hierarchy, nothing can escape the -resource restrictions imposed by the parent. - -Currently, cgroup doesn't impose any restrictions on the number of -cgroups in or nesting depth of a delegated sub-hierarchy; however, -this may be limited explicitly in the future. - - -Delegation Containment -~~~~~~~~~~~~~~~~~~~~~~ - -A delegated sub-hierarchy is contained in the sense that processes -can't be moved into or out of the sub-hierarchy by the delegatee. - -For delegations to a less privileged user, this is achieved by -requiring the following conditions for a process with a non-root euid -to migrate a target process into a cgroup by writing its PID to the -"cgroup.procs" file. - -- The writer must have write access to the "cgroup.procs" file. - -- The writer must have write access to the "cgroup.procs" file of the - common ancestor of the source and destination cgroups. - -The above two constraints ensure that while a delegatee may migrate -processes around freely in the delegated sub-hierarchy it can't pull -in from or push out to outside the sub-hierarchy. - -For an example, let's assume cgroups C0 and C1 have been delegated to -user U0 who created C00, C01 under C0 and C10 under C1 as follows and -all processes under C0 and C1 belong to U0:: - - ~~~~~~~~~~~~~ - C0 - C00 - ~ cgroup ~ \ C01 - ~ hierarchy ~ - ~~~~~~~~~~~~~ - C1 - C10 - -Let's also say U0 wants to write the PID of a process which is -currently in C10 into "C00/cgroup.procs". U0 has write access to the -file; however, the common ancestor of the source cgroup C10 and the -destination cgroup C00 is above the points of delegation and U0 would -not have write access to its "cgroup.procs" files and thus the write -will be denied with -EACCES. - -For delegations to namespaces, containment is achieved by requiring -that both the source and destination cgroups are reachable from the -namespace of the process which is attempting the migration. If either -is not reachable, the migration is rejected with -ENOENT. - - -Guidelines ----------- - -Organize Once and Control -~~~~~~~~~~~~~~~~~~~~~~~~~ - -Migrating a process across cgroups is a relatively expensive operation -and stateful resources such as memory are not moved together with the -process. This is an explicit design decision as there often exist -inherent trade-offs between migration and various hot paths in terms -of synchronization cost. - -As such, migrating processes across cgroups frequently as a means to -apply different resource restrictions is discouraged. A workload -should be assigned to a cgroup according to the system's logical and -resource structure once on start-up. Dynamic adjustments to resource -distribution can be made by changing controller configuration through -the interface files. - - -Avoid Name Collisions -~~~~~~~~~~~~~~~~~~~~~ - -Interface files for a cgroup and its children cgroups occupy the same -directory and it is possible to create children cgroups which collide -with interface files. - -All cgroup core interface files are prefixed with "cgroup." and each -controller's interface files are prefixed with the controller name and -a dot. A controller's name is composed of lower case alphabets and -'_'s but never begins with an '_' so it can be used as the prefix -character for collision avoidance. Also, interface file names won't -start or end with terms which are often used in categorizing workloads -such as job, service, slice, unit or workload. - -cgroup doesn't do anything to prevent name collisions and it's the -user's responsibility to avoid them. - - -Resource Distribution Models -============================ - -cgroup controllers implement several resource distribution schemes -depending on the resource type and expected use cases. This section -describes major schemes in use along with their expected behaviors. - - -Weights -------- - -A parent's resource is distributed by adding up the weights of all -active children and giving each the fraction matching the ratio of its -weight against the sum. As only children which can make use of the -resource at the moment participate in the distribution, this is -work-conserving. Due to the dynamic nature, this model is usually -used for stateless resources. - -All weights are in the range [1, 10000] with the default at 100. This -allows symmetric multiplicative biases in both directions at fine -enough granularity while staying in the intuitive range. - -As long as the weight is in range, all configuration combinations are -valid and there is no reason to reject configuration changes or -process migrations. - -"cpu.weight" proportionally distributes CPU cycles to active children -and is an example of this type. - - -Limits ------- - -A child can only consume upto the configured amount of the resource. -Limits can be over-committed - the sum of the limits of children can -exceed the amount of resource available to the parent. - -Limits are in the range [0, max] and defaults to "max", which is noop. - -As limits can be over-committed, all configuration combinations are -valid and there is no reason to reject configuration changes or -process migrations. - -"io.max" limits the maximum BPS and/or IOPS that a cgroup can consume -on an IO device and is an example of this type. - - -Protections ------------ - -A cgroup is protected to be allocated upto the configured amount of -the resource if the usages of all its ancestors are under their -protected levels. Protections can be hard guarantees or best effort -soft boundaries. Protections can also be over-committed in which case -only upto the amount available to the parent is protected among -children. - -Protections are in the range [0, max] and defaults to 0, which is -noop. - -As protections can be over-committed, all configuration combinations -are valid and there is no reason to reject configuration changes or -process migrations. - -"memory.low" implements best-effort memory protection and is an -example of this type. - - -Allocations ------------ - -A cgroup is exclusively allocated a certain amount of a finite -resource. Allocations can't be over-committed - the sum of the -allocations of children can not exceed the amount of resource -available to the parent. - -Allocations are in the range [0, max] and defaults to 0, which is no -resource. - -As allocations can't be over-committed, some configuration -combinations are invalid and should be rejected. Also, if the -resource is mandatory for execution of processes, process migrations -may be rejected. - -"cpu.rt.max" hard-allocates realtime slices and is an example of this -type. - - -Interface Files -=============== - -Format ------- - -All interface files should be in one of the following formats whenever -possible:: - - New-line separated values - (when only one value can be written at once) - - VAL0\n - VAL1\n - ... - - Space separated values - (when read-only or multiple values can be written at once) - - VAL0 VAL1 ...\n - - Flat keyed - - KEY0 VAL0\n - KEY1 VAL1\n - ... - - Nested keyed - - KEY0 SUB_KEY0=VAL00 SUB_KEY1=VAL01... - KEY1 SUB_KEY0=VAL10 SUB_KEY1=VAL11... - ... - -For a writable file, the format for writing should generally match -reading; however, controllers may allow omitting later fields or -implement restricted shortcuts for most common use cases. - -For both flat and nested keyed files, only the values for a single key -can be written at a time. For nested keyed files, the sub key pairs -may be specified in any order and not all pairs have to be specified. - - -Conventions ------------ - -- Settings for a single feature should be contained in a single file. - -- The root cgroup should be exempt from resource control and thus - shouldn't have resource control interface files. Also, - informational files on the root cgroup which end up showing global - information available elsewhere shouldn't exist. - -- If a controller implements weight based resource distribution, its - interface file should be named "weight" and have the range [1, - 10000] with 100 as the default. The values are chosen to allow - enough and symmetric bias in both directions while keeping it - intuitive (the default is 100%). - -- If a controller implements an absolute resource guarantee and/or - limit, the interface files should be named "min" and "max" - respectively. If a controller implements best effort resource - guarantee and/or limit, the interface files should be named "low" - and "high" respectively. - - In the above four control files, the special token "max" should be - used to represent upward infinity for both reading and writing. - -- If a setting has a configurable default value and keyed specific - overrides, the default entry should be keyed with "default" and - appear as the first entry in the file. - - The default value can be updated by writing either "default $VAL" or - "$VAL". - - When writing to update a specific override, "default" can be used as - the value to indicate removal of the override. Override entries - with "default" as the value must not appear when read. - - For example, a setting which is keyed by major:minor device numbers - with integer values may look like the following:: - - # cat cgroup-example-interface-file - default 150 - 8:0 300 - - The default value can be updated by:: - - # echo 125 > cgroup-example-interface-file - - or:: - - # echo "default 125" > cgroup-example-interface-file - - An override can be set by:: - - # echo "8:16 170" > cgroup-example-interface-file - - and cleared by:: - - # echo "8:0 default" > cgroup-example-interface-file - # cat cgroup-example-interface-file - default 125 - 8:16 170 - -- For events which are not very high frequency, an interface file - "events" should be created which lists event key value pairs. - Whenever a notifiable event happens, file modified event should be - generated on the file. - - -Core Interface Files --------------------- - -All cgroup core files are prefixed with "cgroup." - - cgroup.type - - A read-write single value file which exists on non-root - cgroups. - - When read, it indicates the current type of the cgroup, which - can be one of the following values. - - - "domain" : A normal valid domain cgroup. - - - "domain threaded" : A threaded domain cgroup which is - serving as the root of a threaded subtree. - - - "domain invalid" : A cgroup which is in an invalid state. - It can't be populated or have controllers enabled. It may - be allowed to become a threaded cgroup. - - - "threaded" : A threaded cgroup which is a member of a - threaded subtree. - - A cgroup can be turned into a threaded cgroup by writing - "threaded" to this file. - - cgroup.procs - A read-write new-line separated values file which exists on - all cgroups. - - When read, it lists the PIDs of all processes which belong to - the cgroup one-per-line. The PIDs are not ordered and the - same PID may show up more than once if the process got moved - to another cgroup and then back or the PID got recycled while - reading. - - A PID can be written to migrate the process associated with - the PID to the cgroup. The writer should match all of the - following conditions. - - - It must have write access to the "cgroup.procs" file. - - - It must have write access to the "cgroup.procs" file of the - common ancestor of the source and destination cgroups. - - When delegating a sub-hierarchy, write access to this file - should be granted along with the containing directory. - - In a threaded cgroup, reading this file fails with EOPNOTSUPP - as all the processes belong to the thread root. Writing is - supported and moves every thread of the process to the cgroup. - - cgroup.threads - A read-write new-line separated values file which exists on - all cgroups. - - When read, it lists the TIDs of all threads which belong to - the cgroup one-per-line. The TIDs are not ordered and the - same TID may show up more than once if the thread got moved to - another cgroup and then back or the TID got recycled while - reading. - - A TID can be written to migrate the thread associated with the - TID to the cgroup. The writer should match all of the - following conditions. - - - It must have write access to the "cgroup.threads" file. - - - The cgroup that the thread is currently in must be in the - same resource domain as the destination cgroup. - - - It must have write access to the "cgroup.procs" file of the - common ancestor of the source and destination cgroups. - - When delegating a sub-hierarchy, write access to this file - should be granted along with the containing directory. - - cgroup.controllers - A read-only space separated values file which exists on all - cgroups. - - It shows space separated list of all controllers available to - the cgroup. The controllers are not ordered. - - cgroup.subtree_control - A read-write space separated values file which exists on all - cgroups. Starts out empty. - - When read, it shows space separated list of the controllers - which are enabled to control resource distribution from the - cgroup to its children. - - Space separated list of controllers prefixed with '+' or '-' - can be written to enable or disable controllers. A controller - name prefixed with '+' enables the controller and '-' - disables. If a controller appears more than once on the list, - the last one is effective. When multiple enable and disable - operations are specified, either all succeed or all fail. - - cgroup.events - A read-only flat-keyed file which exists on non-root cgroups. - The following entries are defined. Unless specified - otherwise, a value change in this file generates a file - modified event. - - populated - 1 if the cgroup or its descendants contains any live - processes; otherwise, 0. - - cgroup.max.descendants - A read-write single value files. The default is "max". - - Maximum allowed number of descent cgroups. - If the actual number of descendants is equal or larger, - an attempt to create a new cgroup in the hierarchy will fail. - - cgroup.max.depth - A read-write single value files. The default is "max". - - Maximum allowed descent depth below the current cgroup. - If the actual descent depth is equal or larger, - an attempt to create a new child cgroup will fail. - - cgroup.stat - A read-only flat-keyed file with the following entries: - - nr_descendants - Total number of visible descendant cgroups. - - nr_dying_descendants - Total number of dying descendant cgroups. A cgroup becomes - dying after being deleted by a user. The cgroup will remain - in dying state for some time undefined time (which can depend - on system load) before being completely destroyed. - - A process can't enter a dying cgroup under any circumstances, - a dying cgroup can't revive. - - A dying cgroup can consume system resources not exceeding - limits, which were active at the moment of cgroup deletion. - - -Controllers -=========== - -CPU ---- - -The "cpu" controllers regulates distribution of CPU cycles. This -controller implements weight and absolute bandwidth limit models for -normal scheduling policy and absolute bandwidth allocation model for -realtime scheduling policy. - -WARNING: cgroup2 doesn't yet support control of realtime processes and -the cpu controller can only be enabled when all RT processes are in -the root cgroup. Be aware that system management software may already -have placed RT processes into nonroot cgroups during the system boot -process, and these processes may need to be moved to the root cgroup -before the cpu controller can be enabled. - - -CPU Interface Files -~~~~~~~~~~~~~~~~~~~ - -All time durations are in microseconds. - - cpu.stat - A read-only flat-keyed file which exists on non-root cgroups. - This file exists whether the controller is enabled or not. - - It always reports the following three stats: - - - usage_usec - - user_usec - - system_usec - - and the following three when the controller is enabled: - - - nr_periods - - nr_throttled - - throttled_usec - - cpu.weight - A read-write single value file which exists on non-root - cgroups. The default is "100". - - The weight in the range [1, 10000]. - - cpu.weight.nice - A read-write single value file which exists on non-root - cgroups. The default is "0". - - The nice value is in the range [-20, 19]. - - This interface file is an alternative interface for - "cpu.weight" and allows reading and setting weight using the - same values used by nice(2). Because the range is smaller and - granularity is coarser for the nice values, the read value is - the closest approximation of the current weight. - - cpu.max - A read-write two value file which exists on non-root cgroups. - The default is "max 100000". - - The maximum bandwidth limit. It's in the following format:: - - $MAX $PERIOD - - which indicates that the group may consume upto $MAX in each - $PERIOD duration. "max" for $MAX indicates no limit. If only - one number is written, $MAX is updated. - - -Memory ------- - -The "memory" controller regulates distribution of memory. Memory is -stateful and implements both limit and protection models. Due to the -intertwining between memory usage and reclaim pressure and the -stateful nature of memory, the distribution model is relatively -complex. - -While not completely water-tight, all major memory usages by a given -cgroup are tracked so that the total memory consumption can be -accounted and controlled to a reasonable extent. Currently, the -following types of memory usages are tracked. - -- Userland memory - page cache and anonymous memory. - -- Kernel data structures such as dentries and inodes. - -- TCP socket buffers. - -The above list may expand in the future for better coverage. - - -Memory Interface Files -~~~~~~~~~~~~~~~~~~~~~~ - -All memory amounts are in bytes. If a value which is not aligned to -PAGE_SIZE is written, the value may be rounded up to the closest -PAGE_SIZE multiple when read back. - - memory.current - A read-only single value file which exists on non-root - cgroups. - - The total amount of memory currently being used by the cgroup - and its descendants. - - memory.low - A read-write single value file which exists on non-root - cgroups. The default is "0". - - Best-effort memory protection. If the memory usages of a - cgroup and all its ancestors are below their low boundaries, - the cgroup's memory won't be reclaimed unless memory can be - reclaimed from unprotected cgroups. - - Putting more memory than generally available under this - protection is discouraged. - - memory.high - A read-write single value file which exists on non-root - cgroups. The default is "max". - - Memory usage throttle limit. This is the main mechanism to - control memory usage of a cgroup. If a cgroup's usage goes - over the high boundary, the processes of the cgroup are - throttled and put under heavy reclaim pressure. - - Going over the high limit never invokes the OOM killer and - under extreme conditions the limit may be breached. - - memory.max - A read-write single value file which exists on non-root - cgroups. The default is "max". - - Memory usage hard limit. This is the final protection - mechanism. If a cgroup's memory usage reaches this limit and - can't be reduced, the OOM killer is invoked in the cgroup. - Under certain circumstances, the usage may go over the limit - temporarily. - - This is the ultimate protection mechanism. As long as the - high limit is used and monitored properly, this limit's - utility is limited to providing the final safety net. - - memory.events - A read-only flat-keyed file which exists on non-root cgroups. - The following entries are defined. Unless specified - otherwise, a value change in this file generates a file - modified event. - - low - The number of times the cgroup is reclaimed due to - high memory pressure even though its usage is under - the low boundary. This usually indicates that the low - boundary is over-committed. - - high - The number of times processes of the cgroup are - throttled and routed to perform direct memory reclaim - because the high memory boundary was exceeded. For a - cgroup whose memory usage is capped by the high limit - rather than global memory pressure, this event's - occurrences are expected. - - max - The number of times the cgroup's memory usage was - about to go over the max boundary. If direct reclaim - fails to bring it down, the cgroup goes to OOM state. - - oom - The number of time the cgroup's memory usage was - reached the limit and allocation was about to fail. - - Depending on context result could be invocation of OOM - killer and retrying allocation or failing allocation. - - Failed allocation in its turn could be returned into - userspace as -ENOMEM or silently ignored in cases like - disk readahead. For now OOM in memory cgroup kills - tasks iff shortage has happened inside page fault. - - oom_kill - The number of processes belonging to this cgroup - killed by any kind of OOM killer. - - memory.stat - A read-only flat-keyed file which exists on non-root cgroups. - - This breaks down the cgroup's memory footprint into different - types of memory, type-specific details, and other information - on the state and past events of the memory management system. - - All memory amounts are in bytes. - - The entries are ordered to be human readable, and new entries - can show up in the middle. Don't rely on items remaining in a - fixed position; use the keys to look up specific values! - - anon - Amount of memory used in anonymous mappings such as - brk(), sbrk(), and mmap(MAP_ANONYMOUS) - - file - Amount of memory used to cache filesystem data, - including tmpfs and shared memory. - - kernel_stack - Amount of memory allocated to kernel stacks. - - slab - Amount of memory used for storing in-kernel data - structures. - - sock - Amount of memory used in network transmission buffers - - shmem - Amount of cached filesystem data that is swap-backed, - such as tmpfs, shm segments, shared anonymous mmap()s - - file_mapped - Amount of cached filesystem data mapped with mmap() - - file_dirty - Amount of cached filesystem data that was modified but - not yet written back to disk - - file_writeback - Amount of cached filesystem data that was modified and - is currently being written back to disk - - inactive_anon, active_anon, inactive_file, active_file, unevictable - Amount of memory, swap-backed and filesystem-backed, - on the internal memory management lists used by the - page reclaim algorithm - - slab_reclaimable - Part of "slab" that might be reclaimed, such as - dentries and inodes. - - slab_unreclaimable - Part of "slab" that cannot be reclaimed on memory - pressure. - - pgfault - Total number of page faults incurred - - pgmajfault - Number of major page faults incurred - - workingset_refault - - Number of refaults of previously evicted pages - - workingset_activate - - Number of refaulted pages that were immediately activated - - workingset_nodereclaim - - Number of times a shadow node has been reclaimed - - pgrefill - - Amount of scanned pages (in an active LRU list) - - pgscan - - Amount of scanned pages (in an inactive LRU list) - - pgsteal - - Amount of reclaimed pages - - pgactivate - - Amount of pages moved to the active LRU list - - pgdeactivate - - Amount of pages moved to the inactive LRU lis - - pglazyfree - - Amount of pages postponed to be freed under memory pressure - - pglazyfreed - - Amount of reclaimed lazyfree pages - - memory.swap.current - A read-only single value file which exists on non-root - cgroups. - - The total amount of swap currently being used by the cgroup - and its descendants. - - memory.swap.max - A read-write single value file which exists on non-root - cgroups. The default is "max". - - Swap usage hard limit. If a cgroup's swap usage reaches this - limit, anonymous memory of the cgroup will not be swapped out. - - -Usage Guidelines -~~~~~~~~~~~~~~~~ - -"memory.high" is the main mechanism to control memory usage. -Over-committing on high limit (sum of high limits > available memory) -and letting global memory pressure to distribute memory according to -usage is a viable strategy. - -Because breach of the high limit doesn't trigger the OOM killer but -throttles the offending cgroup, a management agent has ample -opportunities to monitor and take appropriate actions such as granting -more memory or terminating the workload. - -Determining whether a cgroup has enough memory is not trivial as -memory usage doesn't indicate whether the workload can benefit from -more memory. For example, a workload which writes data received from -network to a file can use all available memory but can also operate as -performant with a small amount of memory. A measure of memory -pressure - how much the workload is being impacted due to lack of -memory - is necessary to determine whether a workload needs more -memory; unfortunately, memory pressure monitoring mechanism isn't -implemented yet. - - -Memory Ownership -~~~~~~~~~~~~~~~~ - -A memory area is charged to the cgroup which instantiated it and stays -charged to the cgroup until the area is released. Migrating a process -to a different cgroup doesn't move the memory usages that it -instantiated while in the previous cgroup to the new cgroup. - -A memory area may be used by processes belonging to different cgroups. -To which cgroup the area will be charged is in-deterministic; however, -over time, the memory area is likely to end up in a cgroup which has -enough memory allowance to avoid high reclaim pressure. - -If a cgroup sweeps a considerable amount of memory which is expected -to be accessed repeatedly by other cgroups, it may make sense to use -POSIX_FADV_DONTNEED to relinquish the ownership of memory areas -belonging to the affected files to ensure correct memory ownership. - - -IO --- - -The "io" controller regulates the distribution of IO resources. This -controller implements both weight based and absolute bandwidth or IOPS -limit distribution; however, weight based distribution is available -only if cfq-iosched is in use and neither scheme is available for -blk-mq devices. - - -IO Interface Files -~~~~~~~~~~~~~~~~~~ - - io.stat - A read-only nested-keyed file which exists on non-root - cgroups. - - Lines are keyed by $MAJ:$MIN device numbers and not ordered. - The following nested keys are defined. - - ====== =================== - rbytes Bytes read - wbytes Bytes written - rios Number of read IOs - wios Number of write IOs - ====== =================== - - An example read output follows: - - 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 - 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 - - io.weight - A read-write flat-keyed file which exists on non-root cgroups. - The default is "default 100". - - The first line is the default weight applied to devices - without specific override. The rest are overrides keyed by - $MAJ:$MIN device numbers and not ordered. The weights are in - the range [1, 10000] and specifies the relative amount IO time - the cgroup can use in relation to its siblings. - - The default weight can be updated by writing either "default - $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing - "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default". - - An example read output follows:: - - default 100 - 8:16 200 - 8:0 50 - - io.max - A read-write nested-keyed file which exists on non-root - cgroups. - - BPS and IOPS based IO limit. Lines are keyed by $MAJ:$MIN - device numbers and not ordered. The following nested keys are - defined. - - ===== ================================== - rbps Max read bytes per second - wbps Max write bytes per second - riops Max read IO operations per second - wiops Max write IO operations per second - ===== ================================== - - When writing, any number of nested key-value pairs can be - specified in any order. "max" can be specified as the value - to remove a specific limit. If the same key is specified - multiple times, the outcome is undefined. - - BPS and IOPS are measured in each IO direction and IOs are - delayed if limit is reached. Temporary bursts are allowed. - - Setting read limit at 2M BPS and write at 120 IOPS for 8:16:: - - echo "8:16 rbps=2097152 wiops=120" > io.max - - Reading returns the following:: - - 8:16 rbps=2097152 wbps=max riops=max wiops=120 - - Write IOPS limit can be removed by writing the following:: - - echo "8:16 wiops=max" > io.max - - Reading now returns the following:: - - 8:16 rbps=2097152 wbps=max riops=max wiops=max - - -Writeback -~~~~~~~~~ - -Page cache is dirtied through buffered writes and shared mmaps and -written asynchronously to the backing filesystem by the writeback -mechanism. Writeback sits between the memory and IO domains and -regulates the proportion of dirty memory by balancing dirtying and -write IOs. - -The io controller, in conjunction with the memory controller, -implements control of page cache writeback IOs. The memory controller -defines the memory domain that dirty memory ratio is calculated and -maintained for and the io controller defines the io domain which -writes out dirty pages for the memory domain. Both system-wide and -per-cgroup dirty memory states are examined and the more restrictive -of the two is enforced. - -cgroup writeback requires explicit support from the underlying -filesystem. Currently, cgroup writeback is implemented on ext2, ext4 -and btrfs. On other filesystems, all writeback IOs are attributed to -the root cgroup. - -There are inherent differences in memory and writeback management -which affects how cgroup ownership is tracked. Memory is tracked per -page while writeback per inode. For the purpose of writeback, an -inode is assigned to a cgroup and all IO requests to write dirty pages -from the inode are attributed to that cgroup. - -As cgroup ownership for memory is tracked per page, there can be pages -which are associated with different cgroups than the one the inode is -associated with. These are called foreign pages. The writeback -constantly keeps track of foreign pages and, if a particular foreign -cgroup becomes the majority over a certain period of time, switches -the ownership of the inode to that cgroup. - -While this model is enough for most use cases where a given inode is -mostly dirtied by a single cgroup even when the main writing cgroup -changes over time, use cases where multiple cgroups write to a single -inode simultaneously are not supported well. In such circumstances, a -significant portion of IOs are likely to be attributed incorrectly. -As memory controller assigns page ownership on the first use and -doesn't update it until the page is released, even if writeback -strictly follows page ownership, multiple cgroups dirtying overlapping -areas wouldn't work as expected. It's recommended to avoid such usage -patterns. - -The sysctl knobs which affect writeback behavior are applied to cgroup -writeback as follows. - - vm.dirty_background_ratio, vm.dirty_ratio - These ratios apply the same to cgroup writeback with the - amount of available memory capped by limits imposed by the - memory controller and system-wide clean memory. - - vm.dirty_background_bytes, vm.dirty_bytes - For cgroup writeback, this is calculated into ratio against - total available memory and applied the same way as - vm.dirty[_background]_ratio. - - -PID ---- - -The process number controller is used to allow a cgroup to stop any -new tasks from being fork()'d or clone()'d after a specified limit is -reached. - -The number of tasks in a cgroup can be exhausted in ways which other -controllers cannot prevent, thus warranting its own controller. For -example, a fork bomb is likely to exhaust the number of tasks before -hitting memory restrictions. - -Note that PIDs used in this controller refer to TIDs, process IDs as -used by the kernel. - - -PID Interface Files -~~~~~~~~~~~~~~~~~~~ - - pids.max - A read-write single value file which exists on non-root - cgroups. The default is "max". - - Hard limit of number of processes. - - pids.current - A read-only single value file which exists on all cgroups. - - The number of processes currently in the cgroup and its - descendants. - -Organisational operations are not blocked by cgroup policies, so it is -possible to have pids.current > pids.max. This can be done by either -setting the limit to be smaller than pids.current, or attaching enough -processes to the cgroup such that pids.current is larger than -pids.max. However, it is not possible to violate a cgroup PID policy -through fork() or clone(). These will return -EAGAIN if the creation -of a new process would cause a cgroup policy to be violated. - - -Device controller ------------------ - -Device controller manages access to device files. It includes both -creation of new device files (using mknod), and access to the -existing device files. - -Cgroup v2 device controller has no interface files and is implemented -on top of cgroup BPF. To control access to device files, a user may -create bpf programs of the BPF_CGROUP_DEVICE type and attach them -to cgroups. On an attempt to access a device file, corresponding -BPF programs will be executed, and depending on the return value -the attempt will succeed or fail with -EPERM. - -A BPF_CGROUP_DEVICE program takes a pointer to the bpf_cgroup_dev_ctx -structure, which describes the device access attempt: access type -(mknod/read/write) and device (type, major and minor numbers). -If the program returns 0, the attempt fails with -EPERM, otherwise -it succeeds. - -An example of BPF_CGROUP_DEVICE program may be found in the kernel -source tree in the tools/testing/selftests/bpf/dev_cgroup.c file. - - -RDMA ----- - -The "rdma" controller regulates the distribution and accounting of -of RDMA resources. - -RDMA Interface Files -~~~~~~~~~~~~~~~~~~~~ - - rdma.max - A readwrite nested-keyed file that exists for all the cgroups - except root that describes current configured resource limit - for a RDMA/IB device. - - Lines are keyed by device name and are not ordered. - Each line contains space separated resource name and its configured - limit that can be distributed. - - The following nested keys are defined. - - ========== ============================= - hca_handle Maximum number of HCA Handles - hca_object Maximum number of HCA Objects - ========== ============================= - - An example for mlx4 and ocrdma device follows:: - - mlx4_0 hca_handle=2 hca_object=2000 - ocrdma1 hca_handle=3 hca_object=max - - rdma.current - A read-only file that describes current resource usage. - It exists for all the cgroup except root. - - An example for mlx4 and ocrdma device follows:: - - mlx4_0 hca_handle=1 hca_object=20 - ocrdma1 hca_handle=1 hca_object=23 - - -Misc ----- - -perf_event -~~~~~~~~~~ - -perf_event controller, if not mounted on a legacy hierarchy, is -automatically enabled on the v2 hierarchy so that perf events can -always be filtered by cgroup v2 path. The controller can still be -moved to a legacy hierarchy after v2 hierarchy is populated. - - -Non-normative information -------------------------- - -This section contains information that isn't considered to be a part of -the stable kernel API and so is subject to change. - - -CPU controller root cgroup process behaviour -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When distributing CPU cycles in the root cgroup each thread in this -cgroup is treated as if it was hosted in a separate child cgroup of the -root cgroup. This child cgroup weight is dependent on its thread nice -level. - -For details of this mapping see sched_prio_to_weight array in -kernel/sched/core.c file (values from this array should be scaled -appropriately so the neutral - nice 0 - value is 100 instead of 1024). - - -IO controller root cgroup process behaviour -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Root cgroup processes are hosted in an implicit leaf child node. -When distributing IO resources this implicit child node is taken into -account as if it was a normal child cgroup of the root cgroup with a -weight value of 200. - - -Namespace -========= - -Basics ------- - -cgroup namespace provides a mechanism to virtualize the view of the -"/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone -flag can be used with clone(2) and unshare(2) to create a new cgroup -namespace. The process running inside the cgroup namespace will have -its "/proc/$PID/cgroup" output restricted to cgroupns root. The -cgroupns root is the cgroup of the process at the time of creation of -the cgroup namespace. - -Without cgroup namespace, the "/proc/$PID/cgroup" file shows the -complete path of the cgroup of a process. In a container setup where -a set of cgroups and namespaces are intended to isolate processes the -"/proc/$PID/cgroup" file may leak potential system level information -to the isolated processes. For Example:: - - # cat /proc/self/cgroup - 0::/batchjobs/container_id1 - -The path '/batchjobs/container_id1' can be considered as system-data -and undesirable to expose to the isolated processes. cgroup namespace -can be used to restrict visibility of this path. For example, before -creating a cgroup namespace, one would see:: - - # ls -l /proc/self/ns/cgroup - lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] - # cat /proc/self/cgroup - 0::/batchjobs/container_id1 - -After unsharing a new namespace, the view changes:: - - # ls -l /proc/self/ns/cgroup - lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] - # cat /proc/self/cgroup - 0::/ - -When some thread from a multi-threaded process unshares its cgroup -namespace, the new cgroupns gets applied to the entire process (all -the threads). This is natural for the v2 hierarchy; however, for the -legacy hierarchies, this may be unexpected. - -A cgroup namespace is alive as long as there are processes inside or -mounts pinning it. When the last usage goes away, the cgroup -namespace is destroyed. The cgroupns root and the actual cgroups -remain. - - -The Root and Views ------------------- - -The 'cgroupns root' for a cgroup namespace is the cgroup in which the -process calling unshare(2) is running. For example, if a process in -/batchjobs/container_id1 cgroup calls unshare, cgroup -/batchjobs/container_id1 becomes the cgroupns root. For the -init_cgroup_ns, this is the real root ('/') cgroup. - -The cgroupns root cgroup does not change even if the namespace creator -process later moves to a different cgroup:: - - # ~/unshare -c # unshare cgroupns in some cgroup - # cat /proc/self/cgroup - 0::/ - # mkdir sub_cgrp_1 - # echo 0 > sub_cgrp_1/cgroup.procs - # cat /proc/self/cgroup - 0::/sub_cgrp_1 - -Each process gets its namespace-specific view of "/proc/$PID/cgroup" - -Processes running inside the cgroup namespace will be able to see -cgroup paths (in /proc/self/cgroup) only inside their root cgroup. -From within an unshared cgroupns:: - - # sleep 100000 & - [1] 7353 - # echo 7353 > sub_cgrp_1/cgroup.procs - # cat /proc/7353/cgroup - 0::/sub_cgrp_1 - -From the initial cgroup namespace, the real cgroup path will be -visible:: - - $ cat /proc/7353/cgroup - 0::/batchjobs/container_id1/sub_cgrp_1 - -From a sibling cgroup namespace (that is, a namespace rooted at a -different cgroup), the cgroup path relative to its own cgroup -namespace root will be shown. For instance, if PID 7353's cgroup -namespace root is at '/batchjobs/container_id2', then it will see:: - - # cat /proc/7353/cgroup - 0::/../container_id2/sub_cgrp_1 - -Note that the relative path always starts with '/' to indicate that -its relative to the cgroup namespace root of the caller. - - -Migration and setns(2) ----------------------- - -Processes inside a cgroup namespace can move into and out of the -namespace root if they have proper access to external cgroups. For -example, from inside a namespace with cgroupns root at -/batchjobs/container_id1, and assuming that the global hierarchy is -still accessible inside cgroupns:: - - # cat /proc/7353/cgroup - 0::/sub_cgrp_1 - # echo 7353 > batchjobs/container_id2/cgroup.procs - # cat /proc/7353/cgroup - 0::/../container_id2 - -Note that this kind of setup is not encouraged. A task inside cgroup -namespace should only be exposed to its own cgroupns hierarchy. - -setns(2) to another cgroup namespace is allowed when: - -(a) the process has CAP_SYS_ADMIN against its current user namespace -(b) the process has CAP_SYS_ADMIN against the target cgroup - namespace's userns - -No implicit cgroup changes happen with attaching to another cgroup -namespace. It is expected that the someone moves the attaching -process under the target cgroup namespace root. - - -Interaction with Other Namespaces ---------------------------------- - -Namespace specific cgroup hierarchy can be mounted by a process -running inside a non-init cgroup namespace:: - - # mount -t cgroup2 none $MOUNT_POINT - -This will mount the unified cgroup hierarchy with cgroupns root as the -filesystem root. The process needs CAP_SYS_ADMIN against its user and -mount namespaces. - -The virtualization of /proc/self/cgroup file combined with restricting -the view of cgroup hierarchy by namespace-private cgroupfs mount -provides a properly isolated cgroup view inside the container. - - -Information on Kernel Programming -================================= - -This section contains kernel programming information in the areas -where interacting with cgroup is necessary. cgroup core and -controllers are not covered. - - -Filesystem Support for Writeback --------------------------------- - -A filesystem can support cgroup writeback by updating -address_space_operations->writepage[s]() to annotate bio's using the -following two functions. - - wbc_init_bio(@wbc, @bio) - Should be called for each bio carrying writeback data and - associates the bio with the inode's owner cgroup. Can be - called anytime between bio allocation and submission. - - wbc_account_io(@wbc, @page, @bytes) - Should be called for each data segment being written out. - While this function doesn't care exactly when it's called - during the writeback session, it's the easiest and most - natural to call it as data segments are added to a bio. - -With writeback bio's annotated, cgroup support can be enabled per -super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for -selective disabling of cgroup writeback support which is helpful when -certain filesystem features, e.g. journaled data mode, are -incompatible. - -wbc_init_bio() binds the specified bio to its cgroup. Depending on -the configuration, the bio may be executed at a lower priority and if -the writeback session is holding shared resources, e.g. a journal -entry, may lead to priority inversion. There is no one easy solution -for the problem. Filesystems can try to work around specific problem -cases by skipping wbc_init_bio() or using bio_associate_blkcg() -directly. - - -Deprecated v1 Core Features -=========================== - -- Multiple hierarchies including named ones are not supported. - -- All v1 mount options are not supported. - -- The "tasks" file is removed and "cgroup.procs" is not sorted. - -- "cgroup.clone_children" is removed. - -- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file - at the root instead. - - -Issues with v1 and Rationales for v2 -==================================== - -Multiple Hierarchies --------------------- - -cgroup v1 allowed an arbitrary number of hierarchies and each -hierarchy could host any number of controllers. While this seemed to -provide a high level of flexibility, it wasn't useful in practice. - -For example, as there is only one instance of each controller, utility -type controllers such as freezer which can be useful in all -hierarchies could only be used in one. The issue is exacerbated by -the fact that controllers couldn't be moved to another hierarchy once -hierarchies were populated. Another issue was that all controllers -bound to a hierarchy were forced to have exactly the same view of the -hierarchy. It wasn't possible to vary the granularity depending on -the specific controller. - -In practice, these issues heavily limited which controllers could be -put on the same hierarchy and most configurations resorted to putting -each controller on its own hierarchy. Only closely related ones, such -as the cpu and cpuacct controllers, made sense to be put on the same -hierarchy. This often meant that userland ended up managing multiple -similar hierarchies repeating the same steps on each hierarchy -whenever a hierarchy management operation was necessary. - -Furthermore, support for multiple hierarchies came at a steep cost. -It greatly complicated cgroup core implementation but more importantly -the support for multiple hierarchies restricted how cgroup could be -used in general and what controllers was able to do. - -There was no limit on how many hierarchies there might be, which meant -that a thread's cgroup membership couldn't be described in finite -length. The key might contain any number of entries and was unlimited -in length, which made it highly awkward to manipulate and led to -addition of controllers which existed only to identify membership, -which in turn exacerbated the original problem of proliferating number -of hierarchies. - -Also, as a controller couldn't have any expectation regarding the -topologies of hierarchies other controllers might be on, each -controller had to assume that all other controllers were attached to -completely orthogonal hierarchies. This made it impossible, or at -least very cumbersome, for controllers to cooperate with each other. - -In most use cases, putting controllers on hierarchies which are -completely orthogonal to each other isn't necessary. What usually is -called for is the ability to have differing levels of granularity -depending on the specific controller. In other words, hierarchy may -be collapsed from leaf towards root when viewed from specific -controllers. For example, a given configuration might not care about -how memory is distributed beyond a certain level while still wanting -to control how CPU cycles are distributed. - - -Thread Granularity ------------------- - -cgroup v1 allowed threads of a process to belong to different cgroups. -This didn't make sense for some controllers and those controllers -ended up implementing different ways to ignore such situations but -much more importantly it blurred the line between API exposed to -individual applications and system management interface. - -Generally, in-process knowledge is available only to the process -itself; thus, unlike service-level organization of processes, -categorizing threads of a process requires active participation from -the application which owns the target process. - -cgroup v1 had an ambiguously defined delegation model which got abused -in combination with thread granularity. cgroups were delegated to -individual applications so that they can create and manage their own -sub-hierarchies and control resource distributions along them. This -effectively raised cgroup to the status of a syscall-like API exposed -to lay programs. - -First of all, cgroup has a fundamentally inadequate interface to be -exposed this way. For a process to access its own knobs, it has to -extract the path on the target hierarchy from /proc/self/cgroup, -construct the path by appending the name of the knob to the path, open -and then read and/or write to it. This is not only extremely clunky -and unusual but also inherently racy. There is no conventional way to -define transaction across the required steps and nothing can guarantee -that the process would actually be operating on its own sub-hierarchy. - -cgroup controllers implemented a number of knobs which would never be -accepted as public APIs because they were just adding control knobs to -system-management pseudo filesystem. cgroup ended up with interface -knobs which were not properly abstracted or refined and directly -revealed kernel internal details. These knobs got exposed to -individual applications through the ill-defined delegation mechanism -effectively abusing cgroup as a shortcut to implementing public APIs -without going through the required scrutiny. - -This was painful for both userland and kernel. Userland ended up with -misbehaving and poorly abstracted interfaces and kernel exposing and -locked into constructs inadvertently. - - -Competition Between Inner Nodes and Threads -------------------------------------------- - -cgroup v1 allowed threads to be in any cgroups which created an -interesting problem where threads belonging to a parent cgroup and its -children cgroups competed for resources. This was nasty as two -different types of entities competed and there was no obvious way to -settle it. Different controllers did different things. - -The cpu controller considered threads and cgroups as equivalents and -mapped nice levels to cgroup weights. This worked for some cases but -fell flat when children wanted to be allocated specific ratios of CPU -cycles and the number of internal threads fluctuated - the ratios -constantly changed as the number of competing entities fluctuated. -There also were other issues. The mapping from nice level to weight -wasn't obvious or universal, and there were various other knobs which -simply weren't available for threads. - -The io controller implicitly created a hidden leaf node for each -cgroup to host the threads. The hidden leaf had its own copies of all -the knobs with ``leaf_`` prefixed. While this allowed equivalent -control over internal threads, it was with serious drawbacks. It -always added an extra layer of nesting which wouldn't be necessary -otherwise, made the interface messy and significantly complicated the -implementation. - -The memory controller didn't have a way to control what happened -between internal tasks and child cgroups and the behavior was not -clearly defined. There were attempts to add ad-hoc behaviors and -knobs to tailor the behavior to specific workloads which would have -led to problems extremely difficult to resolve in the long term. - -Multiple controllers struggled with internal tasks and came up with -different ways to deal with it; unfortunately, all the approaches were -severely flawed and, furthermore, the widely different behaviors -made cgroup as a whole highly inconsistent. - -This clearly is a problem which needs to be addressed from cgroup core -in a uniform way. - - -Other Interface Issues ----------------------- - -cgroup v1 grew without oversight and developed a large number of -idiosyncrasies and inconsistencies. One issue on the cgroup core side -was how an empty cgroup was notified - a userland helper binary was -forked and executed for each event. The event delivery wasn't -recursive or delegatable. The limitations of the mechanism also led -to in-kernel event delivery filtering mechanism further complicating -the interface. - -Controller interfaces were problematic too. An extreme example is -controllers completely ignoring hierarchical organization and treating -all cgroups as if they were all located directly under the root -cgroup. Some controllers exposed a large amount of inconsistent -implementation details to userland. - -There also was no consistency across controllers. When a new cgroup -was created, some controllers defaulted to not imposing extra -restrictions while others disallowed any resource usage until -explicitly configured. Configuration knobs for the same type of -control used widely differing naming schemes and formats. Statistics -and information knobs were named arbitrarily and used different -formats and units even in the same controller. - -cgroup v2 establishes common conventions where appropriate and updates -controllers so that they expose minimal and consistent interfaces. - - -Controller Issues and Remedies ------------------------------- - -Memory -~~~~~~ - -The original lower boundary, the soft limit, is defined as a limit -that is per default unset. As a result, the set of cgroups that -global reclaim prefers is opt-in, rather than opt-out. The costs for -optimizing these mostly negative lookups are so high that the -implementation, despite its enormous size, does not even provide the -basic desirable behavior. First off, the soft limit has no -hierarchical meaning. All configured groups are organized in a global -rbtree and treated like equal peers, regardless where they are located -in the hierarchy. This makes subtree delegation impossible. Second, -the soft limit reclaim pass is so aggressive that it not just -introduces high allocation latencies into the system, but also impacts -system performance due to overreclaim, to the point where the feature -becomes self-defeating. - -The memory.low boundary on the other hand is a top-down allocated -reserve. A cgroup enjoys reclaim protection when it and all its -ancestors are below their low boundaries, which makes delegation of -subtrees possible. Secondly, new cgroups have no reserve per default -and in the common case most cgroups are eligible for the preferred -reclaim pass. This allows the new low boundary to be efficiently -implemented with just a minor addition to the generic reclaim code, -without the need for out-of-band data structures and reclaim passes. -Because the generic reclaim code considers all cgroups except for the -ones running low in the preferred first reclaim pass, overreclaim of -individual groups is eliminated as well, resulting in much better -overall workload performance. - -The original high boundary, the hard limit, is defined as a strict -limit that can not budge, even if the OOM killer has to be called. -But this generally goes against the goal of making the most out of the -available memory. The memory consumption of workloads varies during -runtime, and that requires users to overcommit. But doing that with a -strict upper limit requires either a fairly accurate prediction of the -working set size or adding slack to the limit. Since working set size -estimation is hard and error prone, and getting it wrong results in -OOM kills, most users tend to err on the side of a looser limit and -end up wasting precious resources. - -The memory.high boundary on the other hand can be set much more -conservatively. When hit, it throttles allocations by forcing them -into direct reclaim to work off the excess, but it never invokes the -OOM killer. As a result, a high boundary that is chosen too -aggressively will not terminate the processes, but instead it will -lead to gradual performance degradation. The user can monitor this -and make corrections until the minimal memory footprint that still -gives acceptable performance is found. - -In extreme cases, with many concurrent allocations and a complete -breakdown of reclaim progress within the group, the high boundary can -be exceeded. But even then it's mostly better to satisfy the -allocation from the slack available in other groups or the rest of the -system than killing the group. Otherwise, memory.max is there to -limit this type of spillover and ultimately contain buggy or even -malicious applications. - -Setting the original memory.limit_in_bytes below the current usage was -subject to a race condition, where concurrent charges could cause the -limit setting to fail. memory.max on the other hand will first set the -limit to prevent new charges, and then reclaim and OOM kill until the -new limit is met - or the task writing to memory.max is killed. - -The combined memory+swap accounting and limiting is replaced by real -control over swap space. - -The main argument for a combined memory+swap facility in the original -cgroup design was that global or parental pressure would always be -able to swap all anonymous memory of a child group, regardless of the -child's own (possibly untrusted) configuration. However, untrusted -groups can sabotage swapping by other means - such as referencing its -anonymous memory in a tight loop - and an admin can not assume full -swappability when overcommitting untrusted jobs. - -For trusted jobs, on the other hand, a combined counter is not an -intuitive userspace interface, and it flies in the face of the idea -that cgroup controllers should account and limit specific physical -resources. Swap space is a resource like all others in the system, -and that's why unified hierarchy allows distributing it separately. diff --git a/Documentation/clk.txt b/Documentation/clk.txt deleted file mode 100644 index 511628bb3d3a25fcc6ffb3a8d07d2696622d47ab..0000000000000000000000000000000000000000 --- a/Documentation/clk.txt +++ /dev/null @@ -1,307 +0,0 @@ -======================== -The Common Clk Framework -======================== - -:Author: Mike Turquette - -This document endeavours to explain the common clk framework details, -and how to port a platform over to this framework. It is not yet a -detailed explanation of the clock api in include/linux/clk.h, but -perhaps someday it will include that information. - -Introduction and interface split -================================ - -The common clk framework is an interface to control the clock nodes -available on various devices today. This may come in the form of clock -gating, rate adjustment, muxing or other operations. This framework is -enabled with the CONFIG_COMMON_CLK option. - -The interface itself is divided into two halves, each shielded from the -details of its counterpart. First is the common definition of struct -clk which unifies the framework-level accounting and infrastructure that -has traditionally been duplicated across a variety of platforms. Second -is a common implementation of the clk.h api, defined in -drivers/clk/clk.c. Finally there is struct clk_ops, whose operations -are invoked by the clk api implementation. - -The second half of the interface is comprised of the hardware-specific -callbacks registered with struct clk_ops and the corresponding -hardware-specific structures needed to model a particular clock. For -the remainder of this document any reference to a callback in struct -clk_ops, such as .enable or .set_rate, implies the hardware-specific -implementation of that code. Likewise, references to struct clk_foo -serve as a convenient shorthand for the implementation of the -hardware-specific bits for the hypothetical "foo" hardware. - -Tying the two halves of this interface together is struct clk_hw, which -is defined in struct clk_foo and pointed to within struct clk_core. This -allows for easy navigation between the two discrete halves of the common -clock interface. - -Common data structures and api -============================== - -Below is the common struct clk_core definition from -drivers/clk/clk.c, modified for brevity:: - - struct clk_core { - const char *name; - const struct clk_ops *ops; - struct clk_hw *hw; - struct module *owner; - struct clk_core *parent; - const char **parent_names; - struct clk_core **parents; - u8 num_parents; - u8 new_parent_index; - ... - }; - -The members above make up the core of the clk tree topology. The clk -api itself defines several driver-facing functions which operate on -struct clk. That api is documented in include/linux/clk.h. - -Platforms and devices utilizing the common struct clk_core use the struct -clk_ops pointer in struct clk_core to perform the hardware-specific parts of -the operations defined in clk-provider.h:: - - struct clk_ops { - int (*prepare)(struct clk_hw *hw); - void (*unprepare)(struct clk_hw *hw); - int (*is_prepared)(struct clk_hw *hw); - void (*unprepare_unused)(struct clk_hw *hw); - int (*enable)(struct clk_hw *hw); - void (*disable)(struct clk_hw *hw); - int (*is_enabled)(struct clk_hw *hw); - void (*disable_unused)(struct clk_hw *hw); - unsigned long (*recalc_rate)(struct clk_hw *hw, - unsigned long parent_rate); - long (*round_rate)(struct clk_hw *hw, - unsigned long rate, - unsigned long *parent_rate); - int (*determine_rate)(struct clk_hw *hw, - struct clk_rate_request *req); - int (*set_parent)(struct clk_hw *hw, u8 index); - u8 (*get_parent)(struct clk_hw *hw); - int (*set_rate)(struct clk_hw *hw, - unsigned long rate, - unsigned long parent_rate); - int (*set_rate_and_parent)(struct clk_hw *hw, - unsigned long rate, - unsigned long parent_rate, - u8 index); - unsigned long (*recalc_accuracy)(struct clk_hw *hw, - unsigned long parent_accuracy); - int (*get_phase)(struct clk_hw *hw); - int (*set_phase)(struct clk_hw *hw, int degrees); - void (*init)(struct clk_hw *hw); - int (*debug_init)(struct clk_hw *hw, - struct dentry *dentry); - }; - -Hardware clk implementations -============================ - -The strength of the common struct clk_core comes from its .ops and .hw pointers -which abstract the details of struct clk from the hardware-specific bits, and -vice versa. To illustrate consider the simple gateable clk implementation in -drivers/clk/clk-gate.c:: - - struct clk_gate { - struct clk_hw hw; - void __iomem *reg; - u8 bit_idx; - ... - }; - -struct clk_gate contains struct clk_hw hw as well as hardware-specific -knowledge about which register and bit controls this clk's gating. -Nothing about clock topology or accounting, such as enable_count or -notifier_count, is needed here. That is all handled by the common -framework code and struct clk_core. - -Let's walk through enabling this clk from driver code:: - - struct clk *clk; - clk = clk_get(NULL, "my_gateable_clk"); - - clk_prepare(clk); - clk_enable(clk); - -The call graph for clk_enable is very simple:: - - clk_enable(clk); - clk->ops->enable(clk->hw); - [resolves to...] - clk_gate_enable(hw); - [resolves struct clk gate with to_clk_gate(hw)] - clk_gate_set_bit(gate); - -And the definition of clk_gate_set_bit:: - - static void clk_gate_set_bit(struct clk_gate *gate) - { - u32 reg; - - reg = __raw_readl(gate->reg); - reg |= BIT(gate->bit_idx); - writel(reg, gate->reg); - } - -Note that to_clk_gate is defined as:: - - #define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw) - -This pattern of abstraction is used for every clock hardware -representation. - -Supporting your own clk hardware -================================ - -When implementing support for a new type of clock it is only necessary to -include the following header:: - - #include - -To construct a clk hardware structure for your platform you must define -the following:: - - struct clk_foo { - struct clk_hw hw; - ... hardware specific data goes here ... - }; - -To take advantage of your data you'll need to support valid operations -for your clk:: - - struct clk_ops clk_foo_ops { - .enable = &clk_foo_enable; - .disable = &clk_foo_disable; - }; - -Implement the above functions using container_of:: - - #define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw) - - int clk_foo_enable(struct clk_hw *hw) - { - struct clk_foo *foo; - - foo = to_clk_foo(hw); - - ... perform magic on foo ... - - return 0; - }; - -Below is a matrix detailing which clk_ops are mandatory based upon the -hardware capabilities of that clock. A cell marked as "y" means -mandatory, a cell marked as "n" implies that either including that -callback is invalid or otherwise unnecessary. Empty cells are either -optional or must be evaluated on a case-by-case basis. - -.. table:: clock hardware characteristics - - +----------------+------+-------------+---------------+-------------+------+ - | | gate | change rate | single parent | multiplexer | root | - +================+======+=============+===============+=============+======+ - |.prepare | | | | | | - +----------------+------+-------------+---------------+-------------+------+ - |.unprepare | | | | | | - +----------------+------+-------------+---------------+-------------+------+ - +----------------+------+-------------+---------------+-------------+------+ - |.enable | y | | | | | - +----------------+------+-------------+---------------+-------------+------+ - |.disable | y | | | | | - +----------------+------+-------------+---------------+-------------+------+ - |.is_enabled | y | | | | | - +----------------+------+-------------+---------------+-------------+------+ - +----------------+------+-------------+---------------+-------------+------+ - |.recalc_rate | | y | | | | - +----------------+------+-------------+---------------+-------------+------+ - |.round_rate | | y [1]_ | | | | - +----------------+------+-------------+---------------+-------------+------+ - |.determine_rate | | y [1]_ | | | | - +----------------+------+-------------+---------------+-------------+------+ - |.set_rate | | y | | | | - +----------------+------+-------------+---------------+-------------+------+ - +----------------+------+-------------+---------------+-------------+------+ - |.set_parent | | | n | y | n | - +----------------+------+-------------+---------------+-------------+------+ - |.get_parent | | | n | y | n | - +----------------+------+-------------+---------------+-------------+------+ - +----------------+------+-------------+---------------+-------------+------+ - |.recalc_accuracy| | | | | | - +----------------+------+-------------+---------------+-------------+------+ - +----------------+------+-------------+---------------+-------------+------+ - |.init | | | | | | - +----------------+------+-------------+---------------+-------------+------+ - -.. [1] either one of round_rate or determine_rate is required. - -Finally, register your clock at run-time with a hardware-specific -registration function. This function simply populates struct clk_foo's -data and then passes the common struct clk parameters to the framework -with a call to:: - - clk_register(...) - -See the basic clock types in ``drivers/clk/clk-*.c`` for examples. - -Disabling clock gating of unused clocks -======================================= - -Sometimes during development it can be useful to be able to bypass the -default disabling of unused clocks. For example, if drivers aren't enabling -clocks properly but rely on them being on from the bootloader, bypassing -the disabling means that the driver will remain functional while the issues -are sorted out. - -To bypass this disabling, include "clk_ignore_unused" in the bootargs to the -kernel. - -Locking -======= - -The common clock framework uses two global locks, the prepare lock and the -enable lock. - -The enable lock is a spinlock and is held across calls to the .enable, -.disable operations. Those operations are thus not allowed to sleep, -and calls to the clk_enable(), clk_disable() API functions are allowed in -atomic context. - -For clk_is_enabled() API, it is also designed to be allowed to be used in -atomic context. However, it doesn't really make any sense to hold the enable -lock in core, unless you want to do something else with the information of -the enable state with that lock held. Otherwise, seeing if a clk is enabled is -a one-shot read of the enabled state, which could just as easily change after -the function returns because the lock is released. Thus the user of this API -needs to handle synchronizing the read of the state with whatever they're -using it for to make sure that the enable state doesn't change during that -time. - -The prepare lock is a mutex and is held across calls to all other operations. -All those operations are allowed to sleep, and calls to the corresponding API -functions are not allowed in atomic context. - -This effectively divides operations in two groups from a locking perspective. - -Drivers don't need to manually protect resources shared between the operations -of one group, regardless of whether those resources are shared by multiple -clocks or not. However, access to resources that are shared between operations -of the two groups needs to be protected by the drivers. An example of such a -resource would be a register that controls both the clock rate and the clock -enable/disable state. - -The clock framework is reentrant, in that a driver is allowed to call clock -framework functions from within its implementation of clock operations. This -can for instance cause a .set_rate operation of one clock being called from -within the .set_rate operation of another clock. This case must be considered -in the driver implementations, but the code flow is usually controlled by the -driver in that case. - -Note that locking must also be considered when code outside of the common -clock framework needs to access resources used by the clock operations. This -is considered out of scope of this document. diff --git a/Documentation/core-api/atomic_ops.rst b/Documentation/core-api/atomic_ops.rst index fce929144ccdc3de876b94cff93bab7376845b02..2e7165f86f553a65eafc16c92431ff0b6e14927e 100644 --- a/Documentation/core-api/atomic_ops.rst +++ b/Documentation/core-api/atomic_ops.rst @@ -111,7 +111,6 @@ If the compiler can prove that do_something() does not store to the variable a, then the compiler is within its rights transforming this to the following:: - tmp = a; if (a > 0) for (;;) do_something(); @@ -119,7 +118,7 @@ the following:: If you don't want the compiler to do this (and you probably don't), then you should use something like the following:: - while (READ_ONCE(a) < 0) + while (READ_ONCE(a) > 0) do_something(); Alternatively, you could place a barrier() call in the loop. @@ -467,10 +466,12 @@ Like the above, except that these routines return a boolean which indicates whether the changed bit was set _BEFORE_ the atomic bit operation. -WARNING! It is incredibly important that the value be a boolean, -ie. "0" or "1". Do not try to be fancy and save a few instructions by -declaring the above to return "long" and just returning something like -"old_val & mask" because that will not work. + +.. warning:: + It is incredibly important that the value be a boolean, ie. "0" or "1". + Do not try to be fancy and save a few instructions by declaring the + above to return "long" and just returning something like "old_val & + mask" because that will not work. For one thing, this return value gets truncated to int in many code paths using these interfaces, so on 64-bit if the bit is set in the diff --git a/Documentation/cachetlb.txt b/Documentation/core-api/cachetlb.rst similarity index 100% rename from Documentation/cachetlb.txt rename to Documentation/core-api/cachetlb.rst diff --git a/Documentation/circular-buffers.txt b/Documentation/core-api/circular-buffers.rst similarity index 100% rename from Documentation/circular-buffers.txt rename to Documentation/core-api/circular-buffers.rst diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst b/Documentation/core-api/gfp_mask-from-fs-io.rst new file mode 100644 index 0000000000000000000000000000000000000000..e0df8f4165826a7049c332fd25ed8a86537b34c4 --- /dev/null +++ b/Documentation/core-api/gfp_mask-from-fs-io.rst @@ -0,0 +1,66 @@ +================================= +GFP masks used from FS/IO context +================================= + +:Date: May, 2018 +:Author: Michal Hocko + +Introduction +============ + +Code paths in the filesystem and IO stacks must be careful when +allocating memory to prevent recursion deadlocks caused by direct +memory reclaim calling back into the FS or IO paths and blocking on +already held resources (e.g. locks - most commonly those used for the +transaction context). + +The traditional way to avoid this deadlock problem is to clear __GFP_FS +respectively __GFP_IO (note the latter implies clearing the first as well) in +the gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be +used as shortcut. It turned out though that above approach has led to +abuses when the restricted gfp mask is used "just in case" without a +deeper consideration which leads to problems because an excessive use +of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory +reclaim issues. + +New API +======== + +Since 4.12 we do have a generic scope API for both NOFS and NOIO context +``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``, +``memalloc_noio_restore`` which allow to mark a scope to be a critical +section from a filesystem or I/O point of view. Any allocation from that +scope will inherently drop __GFP_FS respectively __GFP_IO from the given +mask so no memory allocation can recurse back in the FS/IO. + +.. kernel-doc:: include/linux/sched/mm.h + :functions: memalloc_nofs_save memalloc_nofs_restore +.. kernel-doc:: include/linux/sched/mm.h + :functions: memalloc_noio_save memalloc_noio_restore + +FS/IO code then simply calls the appropriate save function before +any critical section with respect to the reclaim is started - e.g. +lock shared with the reclaim context or when a transaction context +nesting would be possible via reclaim. The restore function should be +called when the critical section ends. All that ideally along with an +explanation what is the reclaim context for easier maintenance. + +Please note that the proper pairing of save/restore functions +allows nesting so it is safe to call ``memalloc_noio_save`` or +``memalloc_noio_restore`` respectively from an existing NOIO or NOFS +scope. + +What about __vmalloc(GFP_NOFS) +============================== + +vmalloc doesn't support GFP_NOFS semantic because there are hardcoded +GFP_KERNEL allocations deep inside the allocator which are quite non-trivial +to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is +almost always a bug. The good news is that the NOFS/NOIO semantic can be +achieved by the scope API. + +In the ideal world, upper layers should already mark dangerous contexts +and so no special care is required and vmalloc should be called without +any problems. Sometimes if the context is not really clear or there are +layering violations then the recommended way around that is to wrap ``vmalloc`` +by the scope API with a comment explaining the problem. diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index c670a80317862d05f4f60818844b6bcb7cfaa1a0..f5a66b72f9848cfb399b0694463ea1ec5eacb17b 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -14,6 +14,7 @@ Core utilities kernel-api assoc_array atomic_ops + cachetlb refcount-vs-atomic cpu_hotplug idr @@ -25,6 +26,8 @@ Core utilities genalloc errseq printk-formats + circular-buffers + gfp_mask-from-fs-io Interfaces for kernel debugging =============================== diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst index 92f30006adae48846ab89684e8d5c3449ccbe341..76fe2d0f5e7d7db307bfa4ead890ead2d8840bdd 100644 --- a/Documentation/core-api/kernel-api.rst +++ b/Documentation/core-api/kernel-api.rst @@ -39,17 +39,17 @@ String Manipulation .. kernel-doc:: lib/string.c :export: +Basic Kernel Library Functions +============================== + +The Linux kernel provides more basic utility functions. + Bit Operations -------------- .. kernel-doc:: arch/x86/include/asm/bitops.h :internal: -Basic Kernel Library Functions -============================== - -The Linux kernel provides more basic utility functions. - Bitmap Operations ----------------- @@ -80,6 +80,31 @@ Command-line Parsing .. kernel-doc:: lib/cmdline.c :export: +Sorting +------- + +.. kernel-doc:: lib/sort.c + :export: + +.. kernel-doc:: lib/list_sort.c + :export: + +Text Searching +-------------- + +.. kernel-doc:: lib/textsearch.c + :doc: ts_intro + +.. kernel-doc:: lib/textsearch.c + :export: + +.. kernel-doc:: include/linux/textsearch.h + :functions: textsearch_find textsearch_next \ + textsearch_get_pattern textsearch_get_pattern_len + +CRC and Math Functions in Linux +=============================== + CRC Functions ------------- @@ -103,9 +128,6 @@ CRC Functions .. kernel-doc:: lib/crc-itu-t.c :export: -Math Functions in Linux -======================= - Base 2 log and power Functions ------------------------------ @@ -127,28 +149,6 @@ Division Functions .. kernel-doc:: lib/gcd.c :export: -Sorting -------- - -.. kernel-doc:: lib/sort.c - :export: - -.. kernel-doc:: lib/list_sort.c - :export: - -Text Searching --------------- - -.. kernel-doc:: lib/textsearch.c - :doc: ts_intro - -.. kernel-doc:: lib/textsearch.c - :export: - -.. kernel-doc:: include/linux/textsearch.h - :functions: textsearch_find textsearch_next \ - textsearch_get_pattern textsearch_get_pattern_len - UUID/GUID --------- @@ -284,7 +284,7 @@ Resources Management MTRR Handling ------------- -.. kernel-doc:: arch/x86/kernel/cpu/mtrr/main.c +.. kernel-doc:: arch/x86/kernel/cpu/mtrr/mtrr.c :export: Security Framework diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst index 83351c258cdb9a839014b0b1516cbdcbc224dca7..322851bada16742330c8a2cf4457e5ee30698a97 100644 --- a/Documentation/core-api/refcount-vs-atomic.rst +++ b/Documentation/core-api/refcount-vs-atomic.rst @@ -17,7 +17,7 @@ in order to help maintainers validate their code against the change in these memory ordering guarantees. The terms used through this document try to follow the formal LKMM defined in -github.com/aparri/memory-model/blob/master/Documentation/explanation.txt +tools/memory-model/Documentation/explanation.txt. memory-barriers.txt and atomic_t.txt provide more background to the memory ordering in general and for atomic operations specifically. diff --git a/Documentation/crypto/crypto_engine.rst b/Documentation/crypto/crypto_engine.rst index 8272ac92a14fcd5a55f6607c1586ba04a7629c17..1d56221dfe355ad82ef5e391e2851d612206316c 100644 --- a/Documentation/crypto/crypto_engine.rst +++ b/Documentation/crypto/crypto_engine.rst @@ -8,11 +8,13 @@ The crypto engine API (CE), is a crypto queue manager. Requirement ----------- -You have to put at start of your tfm_ctx the struct crypto_engine_ctx -struct your_tfm_ctx { +You have to put at start of your tfm_ctx the struct crypto_engine_ctx:: + + struct your_tfm_ctx { struct crypto_engine_ctx enginectx; ... -}; + }; + Why: Since CE manage only crypto_async_request, it cannot know the underlying request_type and so have access only on the TFM. So using container_of for accessing __ctx is impossible. diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst index 94c4786f257309a1b8e08c3e21cd8af8a5b039cf..c4ff5d791233921bd82591939c1cdf5fb117e859 100644 --- a/Documentation/crypto/index.rst +++ b/Documentation/crypto/index.rst @@ -20,5 +20,6 @@ for cryptographic use cases, as well as programming examples. architecture devel-algos userspace-if + crypto_engine api api-samples diff --git a/Documentation/dell_rbu.txt b/Documentation/dell_rbu.txt index 0fdb6aa2704c506ada0f893e6649e97ce25000ff..5d1ce7bcd04d43e1cfdfb69db98912656864f836 100644 --- a/Documentation/dell_rbu.txt +++ b/Documentation/dell_rbu.txt @@ -121,10 +121,7 @@ read back the image downloaded. .. note:: - This driver requires a patch for firmware_class.c which has the modified - request_firmware_nowait function. - - Also after updating the BIOS image a user mode application needs to execute + After updating the BIOS image a user mode application needs to execute code which sends the BIOS update request to the BIOS. So on the next reboot the BIOS knows about the new image downloaded and it updates itself. Also don't unload the rbu driver if the image has to be updated. diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index f7a18f2743576ab1e96f6ddc6de2a329f1d40c16..aabc8738b3d8688bbc41951d79698b9e89f144fb 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -120,7 +120,7 @@ A typical out of bounds access report looks like this:: The header of the report discribe what kind of bug happened and what kind of access caused it. It's followed by the description of the accessed slub object -(see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) and +(see 'SLUB Debug output' section in Documentation/vm/slub.rst for details) and the description of the accessed memory page. In the last section the report shows memory state around the accessed address. diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index e80850eefe138156cc94db035384391207bb23ec..3bf371a938d033c45305531ad7e0b1c5bd3cbc3a 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -151,6 +151,11 @@ Contributing new tests (details) TEST_FILES, TEST_GEN_FILES mean it is the file which is used by test. + * First use the headers inside the kernel source and/or git repo, and then the + system headers. Headers for the kernel release as opposed to headers + installed by the distro on the system should be the primary focus to be able + to find regressions. + Test Harness ============ diff --git a/Documentation/device-mapper/writecache.txt b/Documentation/device-mapper/writecache.txt new file mode 100644 index 0000000000000000000000000000000000000000..01532b3008ae56bb9db1dc1db7d9b9709db5965e --- /dev/null +++ b/Documentation/device-mapper/writecache.txt @@ -0,0 +1,70 @@ +The writecache target caches writes on persistent memory or on SSD. It +doesn't cache reads because reads are supposed to be cached in page cache +in normal RAM. + +When the device is constructed, the first sector should be zeroed or the +first sector should contain valid superblock from previous invocation. + +Constructor parameters: +1. type of the cache device - "p" or "s" + p - persistent memory + s - SSD +2. the underlying device that will be cached +3. the cache device +4. block size (4096 is recommended; the maximum block size is the page + size) +5. the number of optional parameters (the parameters with an argument + count as two) + start_sector n (default: 0) + offset from the start of cache device in 512-byte sectors + high_watermark n (default: 50) + start writeback when the number of used blocks reach this + watermark + low_watermark x (default: 45) + stop writeback when the number of used blocks drops below + this watermark + writeback_jobs n (default: unlimited) + limit the number of blocks that are in flight during + writeback. Setting this value reduces writeback + throughput, but it may improve latency of read requests + autocommit_blocks n (default: 64 for pmem, 65536 for ssd) + when the application writes this amount of blocks without + issuing the FLUSH request, the blocks are automatically + commited + autocommit_time ms (default: 1000) + autocommit time in milliseconds. The data is automatically + commited if this time passes and no FLUSH request is + received + fua (by default on) + applicable only to persistent memory - use the FUA flag + when writing data from persistent memory back to the + underlying device + nofua + applicable only to persistent memory - don't use the FUA + flag when writing back data and send the FLUSH request + afterwards + - some underlying devices perform better with fua, some + with nofua. The user should test it + +Status: +1. error indicator - 0 if there was no error, otherwise error number +2. the number of blocks +3. the number of free blocks +4. the number of blocks under writeback + +Messages: + flush + flush the cache device. The message returns successfully + if the cache device was flushed without an error + flush_on_suspend + flush the cache device on next suspend. Use this message + when you are going to remove the cache device. The proper + sequence for removing the cache device is: + 1. send the "flush_on_suspend" message + 2. load an inactive table with a linear target that maps + to the underlying device + 3. suspend the device + 4. ask for status and verify that there are no errors + 5. resume the device, so that it will use the linear + target + 6. the cache device is now inactive and it can be deleted diff --git a/Documentation/devicetree/bindings/arm/amlogic.txt b/Documentation/devicetree/bindings/arm/amlogic.txt index f747f47922c55dfabdf56a2ba3db6bbf1d704816..69880560c0f0219aa0a875396d0587b4274c97fc 100644 --- a/Documentation/devicetree/bindings/arm/amlogic.txt +++ b/Documentation/devicetree/bindings/arm/amlogic.txt @@ -25,6 +25,10 @@ Boards with the Amlogic Meson8b SoC shall have the following properties: Required root node property: compatible: "amlogic,meson8b"; +Boards with the Amlogic Meson8m2 SoC shall have the following properties: + Required root node property: + compatible: "amlogic,meson8m2"; + Boards with the Amlogic Meson GXBaby SoC shall have the following properties: Required root node property: compatible: "amlogic,meson-gxbb"; @@ -54,6 +58,8 @@ Board compatible values (alphabetically, grouped by SoC): - "hardkernel,odroid-c1" (Meson8b) - "tronfy,mxq" (Meson8b) + - "tronsmart,mxiii-plus" (Meson8m2) + - "amlogic,p200" (Meson gxbb) - "amlogic,p201" (Meson gxbb) - "friendlyarm,nanopi-k2" (Meson gxbb) diff --git a/Documentation/devicetree/bindings/arm/bcm/brcm,bcm2835.txt b/Documentation/devicetree/bindings/arm/bcm/brcm,bcm2835.txt index 3e3efa046ac57ab49cee7d706cfff3b2d93c6247..1e3e29a545e263470fff686f036361ddaf48abfa 100644 --- a/Documentation/devicetree/bindings/arm/bcm/brcm,bcm2835.txt +++ b/Documentation/devicetree/bindings/arm/bcm/brcm,bcm2835.txt @@ -34,6 +34,10 @@ Raspberry Pi 3 Model B Required root node properties: compatible = "raspberrypi,3-model-b", "brcm,bcm2837"; +Raspberry Pi 3 Model B+ +Required root node properties: +compatible = "raspberrypi,3-model-b-plus", "brcm,bcm2837"; + Raspberry Pi Compute Module Required root node properties: compatible = "raspberrypi,compute-module", "brcm,bcm2835"; diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,g3dsys.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,g3dsys.txt new file mode 100644 index 0000000000000000000000000000000000000000..7de43bf41fdc5f4cf5c409559f8301431216c4d6 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,g3dsys.txt @@ -0,0 +1,30 @@ +MediaTek g3dsys controller +============================ + +The MediaTek g3dsys controller provides various clocks and reset controller to +the GPU. + +Required Properties: + +- compatible: Should be: + - "mediatek,mt2701-g3dsys", "syscon": + for MT2701 SoC + - "mediatek,mt7623-g3dsys", "mediatek,mt2701-g3dsys", "syscon": + for MT7623 SoC +- #clock-cells: Must be 1 +- #reset-cells: Must be 1 + +The g3dsys controller uses the common clk binding from +Documentation/devicetree/bindings/clock/clock-bindings.txt +The available clocks are defined in dt-bindings/clock/mt*-clk.h. + +Example: + +g3dsys: clock-controller@13000000 { + compatible = "mediatek,mt7623-g3dsys", + "mediatek,mt2701-g3dsys", + "syscon"; + reg = <0 0x13000000 0 0x200>; + #clock-cells = <1>; + #reset-cells = <1>; +}; diff --git a/Documentation/devicetree/bindings/arm/samsung/samsung-boards.txt b/Documentation/devicetree/bindings/arm/samsung/samsung-boards.txt index 14510b215480b6f13bd3bf765ca5425fd9438f6f..6970f30a3770f8027a2509aab25f4fe75785667e 100644 --- a/Documentation/devicetree/bindings/arm/samsung/samsung-boards.txt +++ b/Documentation/devicetree/bindings/arm/samsung/samsung-boards.txt @@ -21,8 +21,6 @@ Required root node properties: - "samsung,smdk5420" - for Exynos5420-based Samsung SMDK5420 eval board. - "samsung,tm2" - for Exynos5433-based Samsung TM2 board. - "samsung,tm2e" - for Exynos5433-based Samsung TM2E board. - - "samsung,sd5v1" - for Exynos5440-based Samsung board. - - "samsung,ssdk5440" - for Exynos5440-based Samsung board. * Other companies Exynos SoC based * FriendlyARM @@ -68,7 +66,7 @@ Required root node properties: - "insignal,arndale-octa" - for Exynos5420-based Insignal Arndale Octa board. - "insignal,origen" - for Exynos4210-based Insignal Origen board. - - "insignal,origen4412 - for Exynos4412-based Insignal Origen board. + - "insignal,origen4412" - for Exynos4412-based Insignal Origen board. Optional nodes: diff --git a/Documentation/devicetree/bindings/arm/shmobile.txt b/Documentation/devicetree/bindings/arm/shmobile.txt index d3d1df97834f231c98da1e9027337c692bd4158f..d8cf740132c6d869eaaeaed6fb430b294c3ab9e3 100644 --- a/Documentation/devicetree/bindings/arm/shmobile.txt +++ b/Documentation/devicetree/bindings/arm/shmobile.txt @@ -21,6 +21,8 @@ SoCs: compatible = "renesas,r8a7744" - RZ/G1E (R8A77450) compatible = "renesas,r8a7745" + - RZ/G1C (R8A77470) + compatible = "renesas,r8a77470" - R-Car M1A (R8A77781) compatible = "renesas,r8a7778" - R-Car H1 (R8A77790) @@ -45,6 +47,8 @@ SoCs: compatible = "renesas,r8a77970" - R-Car V3H (R8A77980) compatible = "renesas,r8a77980" + - R-Car E3 (R8A77990) + compatible = "renesas,r8a77990" - R-Car D3 (R8A77995) compatible = "renesas,r8a77995" @@ -67,6 +71,8 @@ Boards: compatible = "renesas,draak", "renesas,r8a77995" - Eagle (RTP0RC77970SEB0010S) compatible = "renesas,eagle", "renesas,r8a77970" + - Ebisu (RTP0RC77990SEB0010S) + compatible = "renesas,ebisu", "renesas,r8a77990" - Genmai (RTK772100BC00000BR) compatible = "renesas,genmai", "renesas,r7s72100" - GR-Peach (X28A-M01-E/F) @@ -78,6 +84,8 @@ Boards: compatible = "renesas,h3ulcb", "renesas,r8a7795" - Henninger compatible = "renesas,henninger", "renesas,r8a7791" + - iWave Systems RZ/G1C Single Board Computer (iW-RainboW-G23S) + compatible = "iwave,g23s", "renesas,r8a77470" - iWave Systems RZ/G1E SODIMM SOM Development Platform (iW-RainboW-G22D) compatible = "iwave,g22d", "iwave,g22m", "renesas,r8a7745" - iWave Systems RZ/G1E SODIMM System On Module (iW-RainboW-G22M-SM) @@ -108,7 +116,7 @@ Boards: compatible = "renesas,salvator-x", "renesas,r8a7795" - Salvator-X (RTP0RC7796SIPB0011S) compatible = "renesas,salvator-x", "renesas,r8a7796" - - Salvator-X (RTP0RC7796SIPB0011S (M3N)) + - Salvator-X (RTP0RC7796SIPB0011S (M3-N)) compatible = "renesas,salvator-x", "renesas,r8a77965" - Salvator-XS (Salvator-X 2nd version, RTP0RC7795SIPB0012S) compatible = "renesas,salvator-xs", "renesas,r8a7795" @@ -124,6 +132,8 @@ Boards: compatible = "renesas,sk-rzg1m", "renesas,r8a7743" - Stout (ADAS Starterkit, Y-R-CAR-ADAS-SKH2-BOARD) compatible = "renesas,stout", "renesas,r8a7790" + - V3HSK (Y-ASK-RCAR-V3H-WS10) + compatible = "renesas,v3hsk", "renesas,r8a77980" - V3MSK (Y-ASK-RCAR-V3M-WS10) compatible = "renesas,v3msk", "renesas,r8a77970" - Wheat (RTP0RC7792ASKB0000JE) diff --git a/Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt b/Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt new file mode 100644 index 0000000000000000000000000000000000000000..99980aee26e5fe748512ed78da9f50315ea1abb3 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/stm32/stm32-syscon.txt @@ -0,0 +1,14 @@ +STMicroelectronics STM32 Platforms System Controller + +Properties: + - compatible : should contain two values. First value must be : + - " st,stm32mp157-syscfg " - for stm32mp157 based SoCs, + second value must be always "syscon". + - reg : offset and length of the register set. + + Example: + syscfg: syscon@50020000 { + compatible = "st,stm32mp157-syscfg", "syscon"; + reg = <0x50020000 0x400>; + }; + diff --git a/Documentation/devicetree/bindings/arm/stm32.txt b/Documentation/devicetree/bindings/arm/stm32/stm32.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/stm32.txt rename to Documentation/devicetree/bindings/arm/stm32/stm32.txt diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-mc.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-mc.txt deleted file mode 100644 index f9632bacbd04aebd4ad68c46012777eb0c263204..0000000000000000000000000000000000000000 --- a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra20-mc.txt +++ /dev/null @@ -1,16 +0,0 @@ -NVIDIA Tegra20 MC(Memory Controller) - -Required properties: -- compatible : "nvidia,tegra20-mc" -- reg : Should contain 2 register ranges(address and length); see the - example below. Note that the MC registers are interleaved with the - GART registers, and hence must be represented as multiple ranges. -- interrupts : Should contain MC General interrupt. - -Example: - memory-controller@7000f000 { - compatible = "nvidia,tegra20-mc"; - reg = <0x7000f000 0x024 - 0x7000f03c 0x3c4>; - interrupts = <0 77 0x04>; - }; diff --git a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra30-mc.txt b/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra30-mc.txt deleted file mode 100644 index bdf1a612422bcb004a4c262648c422ec22fa9ea8..0000000000000000000000000000000000000000 --- a/Documentation/devicetree/bindings/arm/tegra/nvidia,tegra30-mc.txt +++ /dev/null @@ -1,18 +0,0 @@ -NVIDIA Tegra30 MC(Memory Controller) - -Required properties: -- compatible : "nvidia,tegra30-mc" -- reg : Should contain 4 register ranges(address and length); see the - example below. Note that the MC registers are interleaved with the - SMMU registers, and hence must be represented as multiple ranges. -- interrupts : Should contain MC General interrupt. - -Example: - memory-controller { - compatible = "nvidia,tegra30-mc"; - reg = <0x7000f000 0x010 - 0x7000f03c 0x1b4 - 0x7000f200 0x028 - 0x7000f284 0x17c>; - interrupts = <0 77 0x04>; - }; diff --git a/Documentation/devicetree/bindings/arm/ux500/boards.txt b/Documentation/devicetree/bindings/arm/ux500/boards.txt index 7334c24625fccf309c5a3708e3e0c1facd0344e5..0fa429534f4910f79193ed177d8da974ee9caf24 100644 --- a/Documentation/devicetree/bindings/arm/ux500/boards.txt +++ b/Documentation/devicetree/bindings/arm/ux500/boards.txt @@ -26,7 +26,7 @@ interrupt-controller: see binding for interrupt-controller/arm,gic.txt timer: - see binding for arm/twd.txt + see binding for timer/arm,twd.txt clocks: see binding for clocks/ux500.txt diff --git a/Documentation/devicetree/bindings/bus/ti-sysc.txt b/Documentation/devicetree/bindings/bus/ti-sysc.txt index 2957a9ae291f4384ea145b61b2fb42876598a04c..d8ed5b780ed9d31dec1ee7b59aa6758abde4b41d 100644 --- a/Documentation/devicetree/bindings/bus/ti-sysc.txt +++ b/Documentation/devicetree/bindings/bus/ti-sysc.txt @@ -79,7 +79,11 @@ Optional properties: mode as for example omap4 L4_CFG_CLKCTRL - clock-names should contain at least "fck", and optionally also "ick" - depending on the SoC and the interconnect target module + depending on the SoC and the interconnect target module, + some interconnect target modules also need additional + optional clocks that can be specified as listed in TRM + for the related CLKCTRL register bits 8 to 15 such as + "dbclk" or "clk32k" depending on their role - ti,hwmods optional TI interconnect module name to use legacy hwmod platform data diff --git a/Documentation/devicetree/bindings/clock/actions,s900-cmu.txt b/Documentation/devicetree/bindings/clock/actions,s900-cmu.txt new file mode 100644 index 0000000000000000000000000000000000000000..93e4fb827cd60645f29d76ef942ea90b3913f008 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/actions,s900-cmu.txt @@ -0,0 +1,47 @@ +* Actions S900 Clock Management Unit (CMU) + +The Actions S900 clock management unit generates and supplies clock to various +controllers within the SoC. The clock binding described here is applicable to +S900 SoC. + +Required Properties: + +- compatible: should be "actions,s900-cmu" +- reg: physical base address of the controller and length of memory mapped + region. +- clocks: Reference to the parent clocks ("hosc", "losc") +- #clock-cells: should be 1. + +Each clock is assigned an identifier, and client nodes can use this identifier +to specify the clock which they consume. + +All available clocks are defined as preprocessor macros in +dt-bindings/clock/actions,s900-cmu.h header and can be used in device +tree sources. + +External clocks: + +The hosc clock used as input for the plls is generated outside the SoC. It is +expected that it is defined using standard clock bindings as "hosc". + +Actions S900 CMU also requires one more clock: + - "losc" - internal low frequency oscillator + +Example: Clock Management Unit node: + + cmu: clock-controller@e0160000 { + compatible = "actions,s900-cmu"; + reg = <0x0 0xe0160000 0x0 0x1000>; + clocks = <&hosc>, <&losc>; + #clock-cells = <1>; + }; + +Example: UART controller node that consumes clock generated by the clock +management unit: + + uart: serial@e012a000 { + compatible = "actions,s900-uart", "actions,owl-uart"; + reg = <0x0 0xe012a000 0x0 0x2000>; + interrupts = ; + clocks = <&cmu CLK_UART5>; + }; diff --git a/Documentation/devicetree/bindings/clock/amlogic,gxbb-aoclkc.txt b/Documentation/devicetree/bindings/clock/amlogic,gxbb-aoclkc.txt index 786dc39ca904ddd02ede6af5b7617fb9ef2e7d32..3a880528030ea61fd4c6d7f1a869a2ce7c3c3938 100644 --- a/Documentation/devicetree/bindings/clock/amlogic,gxbb-aoclkc.txt +++ b/Documentation/devicetree/bindings/clock/amlogic,gxbb-aoclkc.txt @@ -9,6 +9,7 @@ Required Properties: - GXBB (S905) : "amlogic,meson-gxbb-aoclkc" - GXL (S905X, S905D) : "amlogic,meson-gxl-aoclkc" - GXM (S912) : "amlogic,meson-gxm-aoclkc" + - AXG (A113D, A113X) : "amlogic,meson-axg-aoclkc" followed by the common "amlogic,meson-gx-aoclkc" - #clock-cells: should be 1. diff --git a/Documentation/devicetree/bindings/clock/amlogic,gxbb-clkc.txt b/Documentation/devicetree/bindings/clock/amlogic,gxbb-clkc.txt index e2b377ed6f915440997b3715eed53a94775ab70d..e950599566a99876b9be5e7536828e469fef4f80 100644 --- a/Documentation/devicetree/bindings/clock/amlogic,gxbb-clkc.txt +++ b/Documentation/devicetree/bindings/clock/amlogic,gxbb-clkc.txt @@ -10,9 +10,6 @@ Required Properties: "amlogic,gxl-clkc" for GXL and GXM SoC, "amlogic,axg-clkc" for AXG SoC. -- reg: physical base address of the clock controller and length of memory - mapped region. - - #clock-cells: should be 1. Each clock is assigned an identifier and client nodes can use this identifier @@ -20,13 +17,22 @@ to specify the clock which they consume. All available clocks are defined as preprocessor macros in the dt-bindings/clock/gxbb-clkc.h header and can be used in device tree sources. +Parent node should have the following properties : +- compatible: "syscon", "simple-mfd, and "amlogic,meson-gx-hhi-sysctrl" or + "amlogic,meson-axg-hhi-sysctrl" +- reg: base address and size of the HHI system control register space. + Example: Clock controller node: - clkc: clock-controller@c883c000 { +sysctrl: system-controller@0 { + compatible = "amlogic,meson-gx-hhi-sysctrl", "syscon", "simple-mfd"; + reg = <0 0 0 0x400>; + + clkc: clock-controller { #clock-cells = <1>; compatible = "amlogic,gxbb-clkc"; - reg = <0x0 0xc883c000 0x0 0x3db>; }; +}; Example: UART controller node that consumes the clock generated by the clock controller: diff --git a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt index f8e4a93466cbfc8d5e7f1f11e89424843965fc6c..ab730ea0a560f3b84a7c854bb1d7e13f468ed79c 100644 --- a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt +++ b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt @@ -276,36 +276,38 @@ These clock IDs are defined in: clk_ts_500_ref genpll2 2 BCM_SR_GENPLL2_TS_500_REF_CLK clk_125_nitro genpll2 3 BCM_SR_GENPLL2_125_NITRO_CLK clk_chimp genpll2 4 BCM_SR_GENPLL2_CHIMP_CLK - clk_nic_flash genpll2 5 BCM_SR_GENPLL2_NIC_FLASH + clk_nic_flash genpll2 5 BCM_SR_GENPLL2_NIC_FLASH_CLK + clk_fs genpll2 6 BCM_SR_GENPLL2_FS_CLK genpll3 crystal 0 BCM_SR_GENPLL3 clk_hsls genpll3 1 BCM_SR_GENPLL3_HSLS_CLK clk_sdio genpll3 2 BCM_SR_GENPLL3_SDIO_CLK genpll4 crystal 0 BCM_SR_GENPLL4 - ccn genpll4 1 BCM_SR_GENPLL4_CCN_CLK + clk_ccn genpll4 1 BCM_SR_GENPLL4_CCN_CLK clk_tpiu_pll genpll4 2 BCM_SR_GENPLL4_TPIU_PLL_CLK - noc_clk genpll4 3 BCM_SR_GENPLL4_NOC_CLK + clk_noc genpll4 3 BCM_SR_GENPLL4_NOC_CLK clk_chclk_fs4 genpll4 4 BCM_SR_GENPLL4_CHCLK_FS4_CLK clk_bridge_fscpu genpll4 5 BCM_SR_GENPLL4_BRIDGE_FSCPU_CLK - genpll5 crystal 0 BCM_SR_GENPLL5 - fs4_hf_clk genpll5 1 BCM_SR_GENPLL5_FS4_HF_CLK - crypto_ae_clk genpll5 2 BCM_SR_GENPLL5_CRYPTO_AE_CLK - raid_ae_clk genpll5 3 BCM_SR_GENPLL5_RAID_AE_CLK + clk_fs4_hf genpll5 1 BCM_SR_GENPLL5_FS4_HF_CLK + clk_crypto_ae genpll5 2 BCM_SR_GENPLL5_CRYPTO_AE_CLK + clk_raid_ae genpll5 3 BCM_SR_GENPLL5_RAID_AE_CLK genpll6 crystal 0 BCM_SR_GENPLL6 - 48_usb genpll6 1 BCM_SR_GENPLL6_48_USB_CLK + clk_48_usb genpll6 1 BCM_SR_GENPLL6_48_USB_CLK lcpll0 crystal 0 BCM_SR_LCPLL0 clk_sata_refp lcpll0 1 BCM_SR_LCPLL0_SATA_REFP_CLK clk_sata_refn lcpll0 2 BCM_SR_LCPLL0_SATA_REFN_CLK - clk_usb_ref lcpll0 3 BCM_SR_LCPLL0_USB_REF_CLK - sata_refpn lcpll0 3 BCM_SR_LCPLL0_SATA_REFPN_CLK + clk_sata_350 lcpll0 3 BCM_SR_LCPLL0_SATA_350_CLK + clk_sata_500 lcpll0 4 BCM_SR_LCPLL0_SATA_500_CLK lcpll1 crystal 0 BCM_SR_LCPLL1 - wan lcpll1 1 BCM_SR_LCPLL0_WAN_CLK + clk_wan lcpll1 1 BCM_SR_LCPLL1_WAN_CLK + clk_usb_ref lcpll1 2 BCM_SR_LCPLL1_USB_REF_CLK + clk_crmu_ts lcpll1 3 BCM_SR_LCPLL1_CRMU_TS_CLK lcpll_pcie crystal 0 BCM_SR_LCPLL_PCIE - pcie_phy_ref lcpll1 1 BCM_SR_LCPLL_PCIE_PHY_REF_CLK + clk_pcie_phy_ref lcpll1 1 BCM_SR_LCPLL_PCIE_PHY_REF_CLK diff --git a/Documentation/devicetree/bindings/clock/nuvoton,npcm750-clk.txt b/Documentation/devicetree/bindings/clock/nuvoton,npcm750-clk.txt new file mode 100644 index 0000000000000000000000000000000000000000..f82064546d1117519bfa1c9a6ca497e1821d0e0c --- /dev/null +++ b/Documentation/devicetree/bindings/clock/nuvoton,npcm750-clk.txt @@ -0,0 +1,100 @@ +* Nuvoton NPCM7XX Clock Controller + +Nuvoton Poleg BMC NPCM7XX contains an integrated clock controller, which +generates and supplies clocks to all modules within the BMC. + +External clocks: + +There are six fixed clocks that are generated outside the BMC. All clocks are of +a known fixed value that cannot be changed. clk_refclk, clk_mcbypck and +clk_sysbypck are inputs to the clock controller. +clk_rg1refck, clk_rg2refck and clk_xin are external clocks suppling the +network. They are set on the device tree, but not used by the clock module. The +network devices use them directly. +Example can be found below. + +All available clocks are defined as preprocessor macros in: +dt-bindings/clock/nuvoton,npcm7xx-clock.h +and can be reused as DT sources. + +Required Properties of clock controller: + + - compatible: "nuvoton,npcm750-clk" : for clock controller of Nuvoton + Poleg BMC NPCM750 + + - reg: physical base address of the clock controller and length of + memory mapped region. + + - #clock-cells: should be 1. + +Example: Clock controller node: + + clk: clock-controller@f0801000 { + compatible = "nuvoton,npcm750-clk"; + #clock-cells = <1>; + reg = <0xf0801000 0x1000>; + clock-names = "refclk", "sysbypck", "mcbypck"; + clocks = <&clk_refclk>, <&clk_sysbypck>, <&clk_mcbypck>; + }; + +Example: Required external clocks for network: + + /* external reference clock */ + clk_refclk: clk-refclk { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <25000000>; + clock-output-names = "refclk"; + }; + + /* external reference clock for cpu. float in normal operation */ + clk_sysbypck: clk-sysbypck { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <800000000>; + clock-output-names = "sysbypck"; + }; + + /* external reference clock for MC. float in normal operation */ + clk_mcbypck: clk-mcbypck { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <800000000>; + clock-output-names = "mcbypck"; + }; + + /* external clock signal rg1refck, supplied by the phy */ + clk_rg1refck: clk-rg1refck { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <125000000>; + clock-output-names = "clk_rg1refck"; + }; + + /* external clock signal rg2refck, supplied by the phy */ + clk_rg2refck: clk-rg2refck { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <125000000>; + clock-output-names = "clk_rg2refck"; + }; + + clk_xin: clk-xin { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <50000000>; + clock-output-names = "clk_xin"; + }; + + +Example: GMAC controller node that consumes two clocks: a generated clk by the +clock controller and a fixed clock from DT (clk_rg1refck). + + ethernet0: ethernet@f0802000 { + compatible = "snps,dwmac"; + reg = <0xf0802000 0x2000>; + interrupts = <0 14 4>; + interrupt-names = "macirq"; + clocks = <&clk_rg1refck>, <&clk NPCM7XX_CLK_AHB>; + clock-names = "stmmaceth", "clk_gmac"; + }; diff --git a/Documentation/devicetree/bindings/clock/qcom,gcc.txt b/Documentation/devicetree/bindings/clock/qcom,gcc.txt index 551d03be966587fe27d9f486c0f5b712a9f6f016..664ea1fd6c76a18ce158b4fc01b536ba31dfeeda 100644 --- a/Documentation/devicetree/bindings/clock/qcom,gcc.txt +++ b/Documentation/devicetree/bindings/clock/qcom,gcc.txt @@ -17,7 +17,9 @@ Required properties : "qcom,gcc-msm8974pro-ac" "qcom,gcc-msm8994" "qcom,gcc-msm8996" + "qcom,gcc-msm8998" "qcom,gcc-mdm9615" + "qcom,gcc-sdm845" - reg : shall contain base register location and length - #clock-cells : shall contain 1 diff --git a/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt b/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt new file mode 100644 index 0000000000000000000000000000000000000000..3c007653da31249d176eb71384a0817371b7659c --- /dev/null +++ b/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt @@ -0,0 +1,22 @@ +Qualcomm Technologies, Inc. RPMh Clocks +------------------------------------------------------- + +Resource Power Manager Hardened (RPMh) manages shared resources on +some Qualcomm Technologies Inc. SoCs. It accepts clock requests from +other hardware subsystems via RSC to control clocks. + +Required properties : +- compatible : shall contain "qcom,sdm845-rpmh-clk" + +- #clock-cells : must contain 1 + +Example : + +#include + + &apps_rsc { + rpmhcc: clock-controller { + compatible = "qcom,sdm845-rpmh-clk"; + #clock-cells = <1>; + }; + }; diff --git a/Documentation/devicetree/bindings/clock/qcom,videocc.txt b/Documentation/devicetree/bindings/clock/qcom,videocc.txt new file mode 100644 index 0000000000000000000000000000000000000000..e7c035afa778fb3acade54d7645056f3460ceb38 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/qcom,videocc.txt @@ -0,0 +1,19 @@ +Qualcomm Video Clock & Reset Controller Binding +----------------------------------------------- + +Required properties : +- compatible : shall contain "qcom,sdm845-videocc" +- reg : shall contain base register location and length +- #clock-cells : from common clock binding, shall contain 1. +- #power-domain-cells : from generic power domain binding, shall contain 1. + +Optional properties : +- #reset-cells : from common reset binding, shall contain 1. + +Example: + videocc: clock-controller@ab00000 { + compatible = "qcom,sdm845-videocc"; + reg = <0xab00000 0x10000>; + #clock-cells = <1>; + #power-domain-cells = <1>; + }; diff --git a/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.txt b/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.txt index 773a5226342fc1d2945ea79bd32abb1d0f8e7c8d..db542abadb75bf0ca395ac0910506bce458fecee 100644 --- a/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.txt +++ b/Documentation/devicetree/bindings/clock/renesas,cpg-mssr.txt @@ -15,6 +15,7 @@ Required Properties: - compatible: Must be one of: - "renesas,r8a7743-cpg-mssr" for the r8a7743 SoC (RZ/G1M) - "renesas,r8a7745-cpg-mssr" for the r8a7745 SoC (RZ/G1E) + - "renesas,r8a77470-cpg-mssr" for the r8a77470 SoC (RZ/G1C) - "renesas,r8a7790-cpg-mssr" for the r8a7790 SoC (R-Car H2) - "renesas,r8a7791-cpg-mssr" for the r8a7791 SoC (R-Car M2-W) - "renesas,r8a7792-cpg-mssr" for the r8a7792 SoC (R-Car V2H) @@ -25,6 +26,7 @@ Required Properties: - "renesas,r8a77965-cpg-mssr" for the r8a77965 SoC (R-Car M3-N) - "renesas,r8a77970-cpg-mssr" for the r8a77970 SoC (R-Car V3M) - "renesas,r8a77980-cpg-mssr" for the r8a77980 SoC (R-Car V3H) + - "renesas,r8a77990-cpg-mssr" for the r8a77990 SoC (R-Car E3) - "renesas,r8a77995-cpg-mssr" for the r8a77995 SoC (R-Car D3) - reg: Base address and length of the memory resource used by the CPG/MSSR @@ -33,10 +35,12 @@ Required Properties: - clocks: References to external parent clocks, one entry for each entry in clock-names - clock-names: List of external parent clock names. Valid names are: - - "extal" (r8a7743, r8a7745, r8a7790, r8a7791, r8a7792, r8a7793, r8a7794, - r8a7795, r8a7796, r8a77965, r8a77970, r8a77980, r8a77995) + - "extal" (r8a7743, r8a7745, r8a77470, r8a7790, r8a7791, r8a7792, + r8a7793, r8a7794, r8a7795, r8a7796, r8a77965, r8a77970, + r8a77980, r8a77990, r8a77995) - "extalr" (r8a7795, r8a7796, r8a77965, r8a77970, r8a77980) - - "usb_extal" (r8a7743, r8a7745, r8a7790, r8a7791, r8a7793, r8a7794) + - "usb_extal" (r8a7743, r8a7745, r8a77470, r8a7790, r8a7791, r8a7793, + r8a7794) - #clock-cells: Must be 2 - For CPG core clocks, the two clock specifier cells must be "CPG_CORE" diff --git a/Documentation/devicetree/bindings/clock/rockchip.txt b/Documentation/devicetree/bindings/clock/rockchip.txt deleted file mode 100644 index 22f6769e5d4a2fb61ac2e5302ac0686b17671213..0000000000000000000000000000000000000000 --- a/Documentation/devicetree/bindings/clock/rockchip.txt +++ /dev/null @@ -1,77 +0,0 @@ -Device Tree Clock bindings for arch-rockchip - -This binding uses the common clock binding[1]. - -[1] Documentation/devicetree/bindings/clock/clock-bindings.txt - -== Gate clocks == - -These bindings are deprecated! -Please use the soc specific CRU bindings instead. - -The gate registers form a continuos block which makes the dt node -structure a matter of taste, as either all gates can be put into -one gate clock spanning all registers or they can be divided into -the 10 individual gates containing 16 clocks each. -The code supports both approaches. - -Required properties: -- compatible : "rockchip,rk2928-gate-clk" -- reg : shall be the control register address(es) for the clock. -- #clock-cells : from common clock binding; shall be set to 1 -- clock-output-names : the corresponding gate names that the clock controls -- clocks : should contain the parent clock for each individual gate, - therefore the number of clocks elements should match the number of - clock-output-names - -Example using multiple gate clocks: - - clk_gates0: gate-clk@200000d0 { - compatible = "rockchip,rk2928-gate-clk"; - reg = <0x200000d0 0x4>; - clocks = <&dummy>, <&dummy>, - <&dummy>, <&dummy>, - <&dummy>, <&dummy>, - <&dummy>, <&dummy>, - <&dummy>, <&dummy>, - <&dummy>, <&dummy>, - <&dummy>, <&dummy>, - <&dummy>, <&dummy>; - - clock-output-names = - "gate_core_periph", "gate_cpu_gpll", - "gate_ddrphy", "gate_aclk_cpu", - "gate_hclk_cpu", "gate_pclk_cpu", - "gate_atclk_cpu", "gate_i2s0", - "gate_i2s0_frac", "gate_i2s1", - "gate_i2s1_frac", "gate_i2s2", - "gate_i2s2_frac", "gate_spdif", - "gate_spdif_frac", "gate_testclk"; - - #clock-cells = <1>; - }; - - clk_gates1: gate-clk@200000d4 { - compatible = "rockchip,rk2928-gate-clk"; - reg = <0x200000d4 0x4>; - clocks = <&xin24m>, <&xin24m>, - <&xin24m>, <&dummy>, - <&dummy>, <&xin24m>, - <&xin24m>, <&dummy>, - <&xin24m>, <&dummy>, - <&xin24m>, <&dummy>, - <&xin24m>, <&dummy>, - <&xin24m>, <&dummy>; - - clock-output-names = - "gate_timer0", "gate_timer1", - "gate_timer2", "gate_jtag", - "gate_aclk_lcdc1_src", "gate_otgphy0", - "gate_otgphy1", "gate_ddr_gpll", - "gate_uart0", "gate_frac_uart0", - "gate_uart1", "gate_frac_uart1", - "gate_uart2", "gate_frac_uart2", - "gate_uart3", "gate_frac_uart3"; - - #clock-cells = <1>; - }; diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt index 7364953d0d0bd8c22699366a50981bef3adb5386..45ac19bfa0a9ff4d73cb6cffcd16560dd0fe6b0c 100644 --- a/Documentation/devicetree/bindings/clock/st/st,clkgen.txt +++ b/Documentation/devicetree/bindings/clock/st/st,clkgen.txt @@ -31,10 +31,10 @@ This binding uses the common clock binding[1]. Each subnode should use the binding described in [2]..[7] [1] Documentation/devicetree/bindings/clock/clock-bindings.txt -[3] Documentation/devicetree/bindings/clock/st,clkgen-mux.txt -[4] Documentation/devicetree/bindings/clock/st,clkgen-pll.txt -[7] Documentation/devicetree/bindings/clock/st,quadfs.txt -[8] Documentation/devicetree/bindings/clock/st,flexgen.txt +[3] Documentation/devicetree/bindings/clock/st/st,clkgen-mux.txt +[4] Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt +[7] Documentation/devicetree/bindings/clock/st/st,quadfs.txt +[8] Documentation/devicetree/bindings/clock/st/st,flexgen.txt Required properties: diff --git a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt index 460ef27b1008453dd879cbcdabc83cc9189d846f..47d2e902ced43018a1f58eab35260ffcca239129 100644 --- a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt +++ b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt @@ -21,6 +21,7 @@ Required properties : - "allwinner,sun50i-a64-r-ccu" - "allwinner,sun50i-h5-ccu" - "allwinner,sun50i-h6-ccu" + - "allwinner,sun50i-h6-r-ccu" - "nextthing,gr8-ccu" - reg: Must contain the registers base address and length @@ -35,7 +36,7 @@ Required properties : For the main CCU on H6, one more clock is needed: - "iosc": the SoC's internal frequency oscillator -For the PRCM CCUs on A83T/H3/A64, two more clocks are needed: +For the PRCM CCUs on A83T/H3/A64/H6, two more clocks are needed: - "pll-periph": the SoC's peripheral PLL from the main CCU - "iosc": the SoC's internal frequency oscillator diff --git a/Documentation/devicetree/bindings/clock/ti/gate.txt b/Documentation/devicetree/bindings/clock/ti/gate.txt index 03f8fdee62a7e3e2c559789eb501f9f12d6542a5..56d603c1f71685678a6cf9d8a52da537cff721bd 100644 --- a/Documentation/devicetree/bindings/clock/ti/gate.txt +++ b/Documentation/devicetree/bindings/clock/ti/gate.txt @@ -10,7 +10,7 @@ will be controlled instead and the corresponding hw-ops for that is used. [1] Documentation/devicetree/bindings/clock/clock-bindings.txt -[2] Documentation/devicetree/bindings/clock/gate-clock.txt +[2] Documentation/devicetree/bindings/clock/gpio-gate-clock.txt [3] Documentation/devicetree/bindings/clock/ti/clockdomain.txt Required properties: diff --git a/Documentation/devicetree/bindings/clock/ti/interface.txt b/Documentation/devicetree/bindings/clock/ti/interface.txt index 3111a409fea6cebb739a01592874d2c88d56328b..3f4704040140d1dd350148da2dff43cff63f9120 100644 --- a/Documentation/devicetree/bindings/clock/ti/interface.txt +++ b/Documentation/devicetree/bindings/clock/ti/interface.txt @@ -9,7 +9,7 @@ companion clock finding (match corresponding functional gate clock) and hardware autoidle enable / disable. [1] Documentation/devicetree/bindings/clock/clock-bindings.txt -[2] Documentation/devicetree/bindings/clock/gate-clock.txt +[2] Documentation/devicetree/bindings/clock/gpio-gate-clock.txt Required properties: - compatible : shall be one of: diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt index d36f07e0a2bb43af831d64c6b671f18a6e7a82ef..0551c78619de807b99908d76eaf76cd6831ee3a5 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt @@ -8,7 +8,7 @@ Required properties: "intermediate" - A parent of "cpu" clock which is used as "intermediate" clock source (usually MAINPLL) when the original CPU PLL is under transition and not stable yet. - Please refer to Documentation/devicetree/bindings/clk/clock-bindings.txt for + Please refer to Documentation/devicetree/bindings/clock/clock-bindings.txt for generic clock consumer properties. - operating-points-v2: Please refer to Documentation/devicetree/bindings/opp/opp.txt for detail. diff --git a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt index d6d2833482c9b09ba1f7c664ea4952b8f2b63464..fc2bcbe26b1e5fae443fdaa96a016a48655e8870 100644 --- a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt +++ b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt @@ -12,7 +12,7 @@ Required properties: - clocks: Phandles for clock specified in "clock-names" property - clock-names : The name of clock used by the DFI, must be "pclk_ddr_mon"; -- operating-points-v2: Refer to Documentation/devicetree/bindings/power/opp.txt +- operating-points-v2: Refer to Documentation/devicetree/bindings/opp/opp.txt for details. - center-supply: DMC supply node. - status: Marks the node enabled/disabled. diff --git a/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt b/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt index 0047b1394c704d5ad4e558f8373b5b6f5ea7972b..2c887536258cfef57309664baf57d620074bcb42 100644 --- a/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt +++ b/Documentation/devicetree/bindings/display/bridge/adi,adv7511.txt @@ -14,7 +14,13 @@ Required properties: "adi,adv7513" "adi,adv7533" -- reg: I2C slave address +- reg: I2C slave addresses + The ADV7511 internal registers are split into four pages exposed through + different I2C addresses, creating four register maps. Each map has it own + I2C address and acts as a standard slave device on the I2C bus. The main + address is mandatory, others are optional and revert to defaults if not + specified. + The ADV7511 supports a large number of input data formats that differ by their color depth, color format, clock mode, bit justification and random @@ -70,6 +76,9 @@ Optional properties: rather than generate its own timings for HDMI output. - clocks: from common clock binding: reference to the CEC clock. - clock-names: from common clock binding: must be "cec". +- reg-names : Names of maps with programmable addresses. + It can contain any map needing a non-default address. + Possible maps names are : "main", "edid", "cec", "packet" Required nodes: @@ -88,7 +97,12 @@ Example adv7511w: hdmi@39 { compatible = "adi,adv7511w"; - reg = <39>; + /* + * The EDID page will be accessible on address 0x66 on the I2C + * bus. All other maps continue to use their default addresses. + */ + reg = <0x39>, <0x66>; + reg-names = "main", "edid"; interrupt-parent = <&gpio3>; interrupts = <29 IRQ_TYPE_EDGE_FALLING>; clocks = <&cec_clock>; diff --git a/Documentation/devicetree/bindings/display/bridge/cdns,dsi.txt b/Documentation/devicetree/bindings/display/bridge/cdns,dsi.txt new file mode 100644 index 0000000000000000000000000000000000000000..f5725bb6c61ca9bd6aa5cb645c4aa006e468a5e7 --- /dev/null +++ b/Documentation/devicetree/bindings/display/bridge/cdns,dsi.txt @@ -0,0 +1,133 @@ +Cadence DSI bridge +================== + +The Cadence DSI bridge is a DPI to DSI bridge supporting up to 4 DSI lanes. + +Required properties: +- compatible: should be set to "cdns,dsi". +- reg: physical base address and length of the controller's registers. +- interrupts: interrupt line connected to the DSI bridge. +- clocks: DSI bridge clocks. +- clock-names: must contain "dsi_p_clk" and "dsi_sys_clk". +- phys: phandle link to the MIPI D-PHY controller. +- phy-names: must contain "dphy". +- #address-cells: must be set to 1. +- #size-cells: must be set to 0. + +Optional properties: +- resets: DSI reset lines. +- reset-names: can contain "dsi_p_rst". + +Required subnodes: +- ports: Ports as described in Documentation/devicetree/bindings/graph.txt. + 2 ports are available: + * port 0: this port is only needed if some of your DSI devices are + controlled through an external bus like I2C or SPI. Can have at + most 4 endpoints. The endpoint number is directly encoding the + DSI virtual channel used by this device. + * port 1: represents the DPI input. + Other ports will be added later to support the new kind of inputs. + +- one subnode per DSI device connected on the DSI bus. Each DSI device should + contain a reg property encoding its virtual channel. + +Cadence DPHY +============ + +Cadence DPHY block. + +Required properties: +- compatible: should be set to "cdns,dphy". +- reg: physical base address and length of the DPHY registers. +- clocks: DPHY reference clocks. +- clock-names: must contain "psm" and "pll_ref". +- #phy-cells: must be set to 0. + + +Example: + dphy0: dphy@fd0e0000{ + compatible = "cdns,dphy"; + reg = <0x0 0xfd0e0000 0x0 0x1000>; + clocks = <&psm_clk>, <&pll_ref_clk>; + clock-names = "psm", "pll_ref"; + #phy-cells = <0>; + }; + + dsi0: dsi@fd0c0000 { + compatible = "cdns,dsi"; + reg = <0x0 0xfd0c0000 0x0 0x1000>; + clocks = <&pclk>, <&sysclk>; + clock-names = "dsi_p_clk", "dsi_sys_clk"; + interrupts = <1>; + phys = <&dphy0>; + phy-names = "dphy"; + #address-cells = <1>; + #size-cells = <0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@1 { + reg = <1>; + dsi0_dpi_input: endpoint { + remote-endpoint = <&xxx_dpi_output>; + }; + }; + }; + + panel: dsi-dev@0 { + compatible = ""; + reg = <0>; + }; + }; + +or + + dsi0: dsi@fd0c0000 { + compatible = "cdns,dsi"; + reg = <0x0 0xfd0c0000 0x0 0x1000>; + clocks = <&pclk>, <&sysclk>; + clock-names = "dsi_p_clk", "dsi_sys_clk"; + interrupts = <1>; + phys = <&dphy1>; + phy-names = "dphy"; + #address-cells = <1>; + #size-cells = <0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + #address-cells = <1>; + #size-cells = <0>; + + dsi0_output: endpoint@0 { + reg = <0>; + remote-endpoint = <&dsi_panel_input>; + }; + }; + + port@1 { + reg = <1>; + dsi0_dpi_input: endpoint { + remote-endpoint = <&xxx_dpi_output>; + }; + }; + }; + }; + + i2c@xxx { + panel: panel@59 { + compatible = ""; + reg = <0x59>; + + port { + dsi_panel_input: endpoint { + remote-endpoint = <&dsi0_output>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/display/bridge/renesas,dw-hdmi.txt b/Documentation/devicetree/bindings/display/bridge/renesas,dw-hdmi.txt index 3a72a103a18a8c40d1f09957123d6240d5685c8f..a41d280c3f9f74a5dc2b600a868ec0f8d6e73426 100644 --- a/Documentation/devicetree/bindings/display/bridge/renesas,dw-hdmi.txt +++ b/Documentation/devicetree/bindings/display/bridge/renesas,dw-hdmi.txt @@ -14,6 +14,7 @@ Required properties: - compatible : Shall contain one or more of - "renesas,r8a7795-hdmi" for R8A7795 (R-Car H3) compatible HDMI TX - "renesas,r8a7796-hdmi" for R8A7796 (R-Car M3-W) compatible HDMI TX + - "renesas,r8a77965-hdmi" for R8A77965 (R-Car M3-N) compatible HDMI TX - "renesas,rcar-gen3-hdmi" for the generic R-Car Gen3 compatible HDMI TX When compatible with generic versions, nodes must list the SoC-specific diff --git a/Documentation/devicetree/bindings/display/bridge/tda998x.txt b/Documentation/devicetree/bindings/display/bridge/tda998x.txt index 24cc2466185a490067157b0b923a1fde69432649..f5a02f61dd36f1c68acc89f15507a144188db8c0 100644 --- a/Documentation/devicetree/bindings/display/bridge/tda998x.txt +++ b/Documentation/devicetree/bindings/display/bridge/tda998x.txt @@ -27,7 +27,10 @@ Optional properties: in question is used. The implementation allows one or two DAIs. If two DAIs are defined, they must be of different type. -[1] Documentation/sound/alsa/soc/DAI.txt + - nxp,calib-gpios: calibration GPIO, which must correspond with the + gpio used for the TDA998x interrupt pin. + +[1] Documentation/sound/soc/dai.rst [2] include/dt-bindings/display/tda998x.h Example: diff --git a/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.txt b/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.txt new file mode 100644 index 0000000000000000000000000000000000000000..37f0c04d5a28b60d7a121b566974bad37f69e2f2 --- /dev/null +++ b/Documentation/devicetree/bindings/display/bridge/thine,thc63lvd1024.txt @@ -0,0 +1,60 @@ +Thine Electronics THC63LVD1024 LVDS decoder +------------------------------------------- + +The THC63LVD1024 is a dual link LVDS receiver designed to convert LVDS streams +to parallel data outputs. The chip supports single/dual input/output modes, +handling up to two LVDS input streams and up to two digital CMOS/TTL outputs. + +Single or dual operation mode, output data mapping and DDR output modes are +configured through input signals and the chip does not expose any control bus. + +Required properties: +- compatible: Shall be "thine,thc63lvd1024" +- vcc-supply: Power supply for TTL output, TTL CLOCKOUT signal, LVDS input, + PPL and digital circuitry + +Optional properties: +- powerdown-gpios: Power down GPIO signal, pin name "/PDWN". Active low +- oe-gpios: Output enable GPIO signal, pin name "OE". Active high + +The THC63LVD1024 video port connections are modeled according +to OF graph bindings specified by Documentation/devicetree/bindings/graph.txt + +Required video port nodes: +- port@0: First LVDS input port +- port@2: First digital CMOS/TTL parallel output + +Optional video port nodes: +- port@1: Second LVDS input port +- port@3: Second digital CMOS/TTL parallel output + +Example: +-------- + + thc63lvd1024: lvds-decoder { + compatible = "thine,thc63lvd1024"; + + vcc-supply = <®_lvds_vcc>; + powerdown-gpios = <&gpio4 15 GPIO_ACTIVE_LOW>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + + lvds_dec_in_0: endpoint { + remote-endpoint = <&lvds_out>; + }; + }; + + port@2{ + reg = <2>; + + lvds_dec_out_2: endpoint { + remote-endpoint = <&adv7511_in>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/display/exynos/exynos5433-decon.txt b/Documentation/devicetree/bindings/display/exynos/exynos5433-decon.txt index fc2588292a68173347521e87ce80b9fa1b9ea19a..775193e1c64147212afd32c2e4407621cb56b159 100644 --- a/Documentation/devicetree/bindings/display/exynos/exynos5433-decon.txt +++ b/Documentation/devicetree/bindings/display/exynos/exynos5433-decon.txt @@ -19,7 +19,8 @@ Required properties: clock-names property. - clock-names: list of clock names sorted in the same order as the clocks property. Must contain "pclk", "aclk_decon", "aclk_smmu_decon0x", - "aclk_xiu_decon0x", "pclk_smmu_decon0x", clk_decon_vclk", + "aclk_xiu_decon0x", "pclk_smmu_decon0x", "aclk_smmu_decon1x", + "aclk_xiu_decon1x", "pclk_smmu_decon1x", clk_decon_vclk", "sclk_decon_eclk" - ports: contains a port which is connected to mic node. address-cells and size-cells must 1 and 0, respectively. @@ -34,10 +35,14 @@ decon: decon@13800000 { clocks = <&cmu_disp CLK_ACLK_DECON>, <&cmu_disp CLK_ACLK_SMMU_DECON0X>, <&cmu_disp CLK_ACLK_XIU_DECON0X>, <&cmu_disp CLK_PCLK_SMMU_DECON0X>, + <&cmu_disp CLK_ACLK_SMMU_DECON1X>, + <&cmu_disp CLK_ACLK_XIU_DECON1X>, + <&cmu_disp CLK_PCLK_SMMU_DECON1X>, <&cmu_disp CLK_SCLK_DECON_VCLK>, <&cmu_disp CLK_SCLK_DECON_ECLK>; clock-names = "aclk_decon", "aclk_smmu_decon0x", "aclk_xiu_decon0x", - "pclk_smmu_decon0x", "sclk_decon_vclk", "sclk_decon_eclk"; + "pclk_smmu_decon0x", "aclk_smmu_decon1x", "aclk_xiu_decon1x", + "pclk_smmu_decon1x", "sclk_decon_vclk", "sclk_decon_eclk"; interrupt-names = "vsync", "lcd_sys"; interrupts = <0 202 0>, <0 203 0>; diff --git a/Documentation/devicetree/bindings/display/renesas,du.txt b/Documentation/devicetree/bindings/display/renesas,du.txt index c9cd17f99702335ceb92b580a9d9576c19d02487..7c6854bd0a04e194e414a5301b0dac078fcd57f4 100644 --- a/Documentation/devicetree/bindings/display/renesas,du.txt +++ b/Documentation/devicetree/bindings/display/renesas,du.txt @@ -13,6 +13,7 @@ Required Properties: - "renesas,du-r8a7794" for R8A7794 (R-Car E2) compatible DU - "renesas,du-r8a7795" for R8A7795 (R-Car H3) compatible DU - "renesas,du-r8a7796" for R8A7796 (R-Car M3-W) compatible DU + - "renesas,du-r8a77965" for R8A77965 (R-Car M3-N) compatible DU - "renesas,du-r8a77970" for R8A77970 (R-Car V3M) compatible DU - "renesas,du-r8a77995" for R8A77995 (R-Car D3) compatible DU @@ -47,20 +48,21 @@ bindings specified in Documentation/devicetree/bindings/graph.txt. The following table lists for each supported model the port number corresponding to each DU output. - Port0 Port1 Port2 Port3 + Port0 Port1 Port2 Port3 ----------------------------------------------------------------------------- - R8A7743 (RZ/G1M) DPAD 0 LVDS 0 - - - R8A7745 (RZ/G1E) DPAD 0 DPAD 1 - - - R8A7779 (R-Car H1) DPAD 0 DPAD 1 - - - R8A7790 (R-Car H2) DPAD 0 LVDS 0 LVDS 1 - - R8A7791 (R-Car M2-W) DPAD 0 LVDS 0 - - - R8A7792 (R-Car V2H) DPAD 0 DPAD 1 - - - R8A7793 (R-Car M2-N) DPAD 0 LVDS 0 - - - R8A7794 (R-Car E2) DPAD 0 DPAD 1 - - - R8A7795 (R-Car H3) DPAD 0 HDMI 0 HDMI 1 LVDS 0 - R8A7796 (R-Car M3-W) DPAD 0 HDMI 0 LVDS 0 - - R8A77970 (R-Car V3M) DPAD 0 LVDS 0 - - - R8A77995 (R-Car D3) DPAD 0 LVDS 0 LVDS 1 - + R8A7743 (RZ/G1M) DPAD 0 LVDS 0 - - + R8A7745 (RZ/G1E) DPAD 0 DPAD 1 - - + R8A7779 (R-Car H1) DPAD 0 DPAD 1 - - + R8A7790 (R-Car H2) DPAD 0 LVDS 0 LVDS 1 - + R8A7791 (R-Car M2-W) DPAD 0 LVDS 0 - - + R8A7792 (R-Car V2H) DPAD 0 DPAD 1 - - + R8A7793 (R-Car M2-N) DPAD 0 LVDS 0 - - + R8A7794 (R-Car E2) DPAD 0 DPAD 1 - - + R8A7795 (R-Car H3) DPAD 0 HDMI 0 HDMI 1 LVDS 0 + R8A7796 (R-Car M3-W) DPAD 0 HDMI 0 LVDS 0 - + R8A77965 (R-Car M3-N) DPAD 0 HDMI 0 LVDS 0 - + R8A77970 (R-Car V3M) DPAD 0 LVDS 0 - - + R8A77995 (R-Car D3) DPAD 0 LVDS 0 LVDS 1 - Example: R8A7795 (R-Car H3) ES2.0 DU diff --git a/Documentation/devicetree/bindings/display/sunxi/sun6i-dsi.txt b/Documentation/devicetree/bindings/display/sunxi/sun6i-dsi.txt new file mode 100644 index 0000000000000000000000000000000000000000..6a6cf5de08b0c445718ab71a8071a30e8d13a300 --- /dev/null +++ b/Documentation/devicetree/bindings/display/sunxi/sun6i-dsi.txt @@ -0,0 +1,93 @@ +Allwinner A31 DSI Encoder +========================= + +The DSI pipeline consists of two separate blocks: the DSI controller +itself, and its associated D-PHY. + +DSI Encoder +----------- + +The DSI Encoder generates the DSI signal from the TCON's. + +Required properties: + - compatible: value must be one of: + * allwinner,sun6i-a31-mipi-dsi + - reg: base address and size of memory-mapped region + - interrupts: interrupt associated to this IP + - clocks: phandles to the clocks feeding the DSI encoder + * bus: the DSI interface clock + * mod: the DSI module clock + - clock-names: the clock names mentioned above + - phys: phandle to the D-PHY + - phy-names: must be "dphy" + - resets: phandle to the reset controller driving the encoder + + - ports: A ports node with endpoint definitions as defined in + Documentation/devicetree/bindings/media/video-interfaces.txt. The + first port should be the input endpoint, usually coming from the + associated TCON. + +Any MIPI-DSI device attached to this should be described according to +the bindings defined in ../mipi-dsi-bus.txt + +D-PHY +----- + +Required properties: + - compatible: value must be one of: + * allwinner,sun6i-a31-mipi-dphy + - reg: base address and size of memory-mapped region + - clocks: phandles to the clocks feeding the DSI encoder + * bus: the DSI interface clock + * mod: the DSI module clock + - clock-names: the clock names mentioned above + - resets: phandle to the reset controller driving the encoder + +Example: + +dsi0: dsi@1ca0000 { + compatible = "allwinner,sun6i-a31-mipi-dsi"; + reg = <0x01ca0000 0x1000>; + interrupts = ; + clocks = <&ccu CLK_BUS_MIPI_DSI>, + <&ccu CLK_DSI_SCLK>; + clock-names = "bus", "mod"; + resets = <&ccu RST_BUS_MIPI_DSI>; + phys = <&dphy0>; + phy-names = "dphy"; + #address-cells = <1>; + #size-cells = <0>; + + panel@0 { + compatible = "bananapi,lhr050h41", "ilitek,ili9881c"; + reg = <0>; + power-gpios = <&pio 1 7 GPIO_ACTIVE_HIGH>; /* PB07 */ + reset-gpios = <&r_pio 0 5 GPIO_ACTIVE_LOW>; /* PL05 */ + backlight = <&pwm_bl>; + }; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + + dsi0_in_tcon0: endpoint { + remote-endpoint = <&tcon0_out_dsi0>; + }; + }; + }; +}; + +dphy0: d-phy@1ca1000 { + compatible = "allwinner,sun6i-a31-mipi-dphy"; + reg = <0x01ca1000 0x1000>; + clocks = <&ccu CLK_BUS_MIPI_DSI>, + <&ccu CLK_DSI_DPHY>; + clock-names = "bus", "mod"; + resets = <&ccu RST_BUS_MIPI_DSI>; + #phy-cells = <0>; +}; diff --git a/Documentation/devicetree/bindings/display/tilcdc/tilcdc.txt b/Documentation/devicetree/bindings/display/tilcdc/tilcdc.txt index 6fddb4f4f71a45f0fc001b7f904a89f6e50948b3..3055d5c2c04e0ab796215803196c7590a69e02be 100644 --- a/Documentation/devicetree/bindings/display/tilcdc/tilcdc.txt +++ b/Documentation/devicetree/bindings/display/tilcdc/tilcdc.txt @@ -36,7 +36,7 @@ Optional nodes: - port/ports: to describe a connection to an external encoder. The binding follows Documentation/devicetree/bindings/graph.txt and - suppors a single port with a single endpoint. + supports a single port with a single endpoint. - See also Documentation/devicetree/bindings/display/tilcdc/panel.txt and Documentation/devicetree/bindings/display/tilcdc/tfp410.txt for connecting diff --git a/Documentation/devicetree/bindings/dma/k3dma.txt b/Documentation/devicetree/bindings/dma/k3dma.txt index 23f8d712c3ce531c31023c21cffde12470212ffc..4945aeac4dc4f0b12499a6b947227fcf785325f5 100644 --- a/Documentation/devicetree/bindings/dma/k3dma.txt +++ b/Documentation/devicetree/bindings/dma/k3dma.txt @@ -23,7 +23,6 @@ Controller: dma-requests = <27>; interrupts = <0 12 4>; clocks = <&pclk>; - status = "disable"; }; Client: diff --git a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt index 61315eaa76606d777a05d61a3caa3dbf845bd81a..b1ba639554c087c0de5bab69b3b6abb208c6931a 100644 --- a/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,rcar-dmac.txt @@ -29,6 +29,7 @@ Required Properties: - "renesas,dmac-r8a77965" (R-Car M3-N) - "renesas,dmac-r8a77970" (R-Car V3M) - "renesas,dmac-r8a77980" (R-Car V3H) + - "renesas,dmac-r8a77995" (R-Car D3) - reg: base address and length of the registers block for the DMAC diff --git a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt index 9dc935e24e558f4f6edaa166a29fbd038bc66d48..482e54362d3e0ba60bd1b5f731f14854d0168c30 100644 --- a/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt +++ b/Documentation/devicetree/bindings/dma/renesas,usb-dmac.txt @@ -12,6 +12,8 @@ Required Properties: - "renesas,r8a7795-usb-dmac" (R-Car H3) - "renesas,r8a7796-usb-dmac" (R-Car M3-W) - "renesas,r8a77965-usb-dmac" (R-Car M3-N) + - "renesas,r8a77990-usb-dmac" (R-Car E3) + - "renesas,r8a77995-usb-dmac" (R-Car D3) - reg: base address and length of the registers block for the DMAC - interrupts: interrupt specifiers for the DMAC, one for each entry in interrupt-names. diff --git a/Documentation/devicetree/bindings/dma/ti-edma.txt b/Documentation/devicetree/bindings/dma/ti-edma.txt index 66026dcf53e12cfa8077a60480cefe7a8c6cff18..3f15f6644527313057e2ca8d90bf5e2f51296a3e 100644 --- a/Documentation/devicetree/bindings/dma/ti-edma.txt +++ b/Documentation/devicetree/bindings/dma/ti-edma.txt @@ -190,7 +190,6 @@ mmc0: mmc@23000000 { power-domains = <&k2g_pds 0xb>; clocks = <&k2g_clks 0xb 1>, <&k2g_clks 0xb 2>; clock-names = "fck", "mmchsdb_fck"; - status = "disabled"; }; ------------------------------------------------------------------------------ diff --git a/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt b/Documentation/devicetree/bindings/edac/socfpga-eccmgr.txt similarity index 87% rename from Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt rename to Documentation/devicetree/bindings/edac/socfpga-eccmgr.txt index 4a1714f96babb4c3c2b51c1a1556cd17c9608e23..5626560a6cfdff805ebd1460a14ec2127747de4a 100644 --- a/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt +++ b/Documentation/devicetree/bindings/edac/socfpga-eccmgr.txt @@ -231,3 +231,38 @@ Example: <48 IRQ_TYPE_LEVEL_HIGH>; }; }; + +Stratix10 SoCFPGA ECC Manager +The Stratix10 SoC ECC Manager handles the IRQs for each peripheral +in a shared register similar to the Arria10. However, ECC requires +access to registers that can only be read from Secure Monitor with +SMC calls. Therefore the device tree is slightly different. + +Required Properties: +- compatible : Should be "altr,socfpga-s10-ecc-manager" +- interrupts : Should be single bit error interrupt, then double bit error + interrupt. +- interrupt-controller : boolean indicator that ECC Manager is an interrupt controller +- #interrupt-cells : must be set to 2. + +Subcomponents: + +SDRAM ECC +Required Properties: +- compatible : Should be "altr,sdram-edac-s10" +- interrupts : Should be single bit error interrupt, then double bit error + interrupt, in this order. + +Example: + + eccmgr { + compatible = "altr,socfpga-s10-ecc-manager"; + interrupts = <0 15 4>, <0 95 4>; + interrupt-controller; + #interrupt-cells = <2>; + + sdramedac { + compatible = "altr,sdram-edac-s10"; + interrupts = <16 4>, <48 4>; + }; + }; diff --git a/Documentation/devicetree/bindings/firmware/qcom,scm.txt b/Documentation/devicetree/bindings/firmware/qcom,scm.txt index 7b40054be0d8c76a3196cdb9fd2cc093ce63b463..fcf6979c0b6d3e0ba95f22ea70e56d103ac77d1e 100644 --- a/Documentation/devicetree/bindings/firmware/qcom,scm.txt +++ b/Documentation/devicetree/bindings/firmware/qcom,scm.txt @@ -11,9 +11,10 @@ Required properties: * "qcom,scm-msm8660" for MSM8660 platforms * "qcom,scm-msm8690" for MSM8690 platforms * "qcom,scm-msm8996" for MSM8996 platforms + * "qcom,scm-ipq4019" for IPQ4019 platforms * "qcom,scm" for later processors (MSM8916, APQ8084, MSM8974, etc) - clocks: One to three clocks may be required based on compatible. - * No clock required for "qcom,scm-msm8996" + * No clock required for "qcom,scm-msm8996", "qcom,scm-ipq4019" * Only core clock required for "qcom,scm-apq8064", "qcom,scm-msm8660", and "qcom,scm-msm8960" * Core, iface, and bus clocks required for "qcom,scm" - clock-names: Must contain "core" for the core clock, "iface" for the interface diff --git a/Documentation/devicetree/bindings/fpga/lattice-machxo2-spi.txt b/Documentation/devicetree/bindings/fpga/lattice-machxo2-spi.txt new file mode 100644 index 0000000000000000000000000000000000000000..a8c362eb160cc7f0a686c850ccc9703edc2a8dca --- /dev/null +++ b/Documentation/devicetree/bindings/fpga/lattice-machxo2-spi.txt @@ -0,0 +1,29 @@ +Lattice MachXO2 Slave SPI FPGA Manager + +Lattice MachXO2 FPGAs support a method of loading the bitstream over +'slave SPI' interface. + +See 'MachXO2ProgrammingandConfigurationUsageGuide.pdf' on www.latticesemi.com + +Required properties: +- compatible: should contain "lattice,machxo2-slave-spi" +- reg: spi chip select of the FPGA + +Example for full FPGA configuration: + + fpga-region0 { + compatible = "fpga-region"; + fpga-mgr = <&fpga_mgr_spi>; + #address-cells = <0x1>; + #size-cells = <0x1>; + }; + + spi1: spi@2000 { + ... + + fpga_mgr_spi: fpga-mgr@0 { + compatible = "lattice,machxo2-slave-spi"; + spi-max-frequency = <8000000>; + reg = <0>; + }; + }; diff --git a/Documentation/devicetree/bindings/fsi/fsi-master-gpio.txt b/Documentation/devicetree/bindings/fsi/fsi-master-gpio.txt index a767259dedadc51cd333c284f121902858046e5a..1e442450747f9c9c9c68a2a71893255359acabdf 100644 --- a/Documentation/devicetree/bindings/fsi/fsi-master-gpio.txt +++ b/Documentation/devicetree/bindings/fsi/fsi-master-gpio.txt @@ -11,6 +11,10 @@ Optional properties: - trans-gpios = ; : GPIO for voltage translator enable - mux-gpios = ; : GPIO for pin multiplexing with other functions (eg, external FSI masters) + - no-gpio-delays; : Don't add extra delays between GPIO + accesses. This is useful when the HW + GPIO block is running at a low enough + frequency. Examples: diff --git a/Documentation/devicetree/bindings/gpio/gpio-pca953x.txt b/Documentation/devicetree/bindings/gpio/gpio-pca953x.txt index d2a9376828369b33aaa4108ac5def0343f669409..88f22866550750f93d02e90b4026d5df28b70799 100644 --- a/Documentation/devicetree/bindings/gpio/gpio-pca953x.txt +++ b/Documentation/devicetree/bindings/gpio/gpio-pca953x.txt @@ -31,10 +31,15 @@ Required properties: ti,tca9554 onnn,pca9654 exar,xra1202 + - gpio-controller: if used as gpio expander. + - #gpio-cells: if used as gpio expander. + - interrupt-controller: if to be used as interrupt expander. + - #interrupt-cells: if to be used as interrupt expander. Optional properties: - reset-gpios: GPIO specification for the RESET input. This is an active low signal to the PCA953x. + - vcc-supply: power supply regulator. Example: @@ -47,3 +52,32 @@ Example: interrupt-parent = <&gpio3>; interrupts = <23 IRQ_TYPE_LEVEL_LOW>; }; + + +Example with Interrupts: + + + gpio99: gpio@22 { + compatible = "nxp,pcal6524"; + reg = <0x22>; + interrupt-parent = <&gpio6>; + interrupts = <1 IRQ_TYPE_EDGE_FALLING>; /* gpio6_161 */ + interrupt-controller; + #interrupt-cells = <2>; + vcc-supply = <&vdds_1v8_main>; + gpio-controller; + #gpio-cells = <2>; + gpio-line-names = + "hdmi-ct-hpd", "hdmi.ls-oe", "p02", "p03", "vibra", "fault2", "p06", "p07", + "en-usb", "en-host1", "en-host2", "chg-int", "p14", "p15", "mic-int", "en-modem", + "shdn-hs-amp", "chg-status+red", "green", "blue", "en-esata", "fault1", "p26", "p27"; + }; + + ts3a227@3b { + compatible = "ti,ts3a227e"; + reg = <0x3b>; + interrupt-parent = <&gpio99>; + interrupts = <14 IRQ_TYPE_EDGE_RISING>; + ti,micbias = <0>; /* 2.1V */ + }; + diff --git a/Documentation/devicetree/bindings/gpio/nintendo,hollywood-gpio.txt b/Documentation/devicetree/bindings/gpio/nintendo,hollywood-gpio.txt index 20fc72d9e61e5721e56e0aeb0479682f921fd154..45a61b46228712592029e75fe117262ba47d9112 100644 --- a/Documentation/devicetree/bindings/gpio/nintendo,hollywood-gpio.txt +++ b/Documentation/devicetree/bindings/gpio/nintendo,hollywood-gpio.txt @@ -1,7 +1,7 @@ Nintendo Wii (Hollywood) GPIO controller Required properties: -- compatible: "nintendo,hollywood-gpio +- compatible: "nintendo,hollywood-gpio" - reg: Physical base address and length of the controller's registers. - gpio-controller: Marks the device node as a GPIO controller. - #gpio-cells: Should be <2>. The first cell is the pin number and the diff --git a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt index 9474138d776ed58da02dd925105dd89daa3c6cb4..378f1322211e9a382a65fe0618c1557fc3577852 100644 --- a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt +++ b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt @@ -5,6 +5,7 @@ Required Properties: - compatible: should contain one or more of the following: - "renesas,gpio-r8a7743": for R8A7743 (RZ/G1M) compatible GPIO controller. - "renesas,gpio-r8a7745": for R8A7745 (RZ/G1E) compatible GPIO controller. + - "renesas,gpio-r8a77470": for R8A77470 (RZ/G1C) compatible GPIO controller. - "renesas,gpio-r8a7778": for R8A7778 (R-Car M1) compatible GPIO controller. - "renesas,gpio-r8a7779": for R8A7779 (R-Car H1) compatible GPIO controller. - "renesas,gpio-r8a7790": for R8A7790 (R-Car H2) compatible GPIO controller. @@ -14,7 +15,9 @@ Required Properties: - "renesas,gpio-r8a7794": for R8A7794 (R-Car E2) compatible GPIO controller. - "renesas,gpio-r8a7795": for R8A7795 (R-Car H3) compatible GPIO controller. - "renesas,gpio-r8a7796": for R8A7796 (R-Car M3-W) compatible GPIO controller. + - "renesas,gpio-r8a77965": for R8A77965 (R-Car M3-N) compatible GPIO controller. - "renesas,gpio-r8a77970": for R8A77970 (R-Car V3M) compatible GPIO controller. + - "renesas,gpio-r8a77990": for R8A77990 (R-Car E3) compatible GPIO controller. - "renesas,gpio-r8a77995": for R8A77995 (R-Car D3) compatible GPIO controller. - "renesas,rcar-gen1-gpio": for a generic R-Car Gen1 GPIO controller. - "renesas,rcar-gen2-gpio": for a generic R-Car Gen2 or RZ/G1 GPIO controller. diff --git a/Documentation/devicetree/bindings/gpio/snps-dwapb-gpio.txt b/Documentation/devicetree/bindings/gpio/snps-dwapb-gpio.txt index 4a75da7051bd218c12ef8ff3bf49e555e2e0991c..3c1118bc67f50cb1214327a06d4ff57366d92eaf 100644 --- a/Documentation/devicetree/bindings/gpio/snps-dwapb-gpio.txt +++ b/Documentation/devicetree/bindings/gpio/snps-dwapb-gpio.txt @@ -26,8 +26,13 @@ controller. the second encodes the triger flags encoded as described in Documentation/devicetree/bindings/interrupt-controller/interrupts.txt - interrupt-parent : The parent interrupt controller. -- interrupts : The interrupt to the parent controller raised when GPIOs - generate the interrupts. +- interrupts : The interrupts to the parent controller raised when GPIOs + generate the interrupts. If the controller provides one combined interrupt + for all GPIOs, specify a single interrupt. If the controller provides one + interrupt for each GPIO, provide a list of interrupts that correspond to each + of the GPIO pins. When specifying multiple interrupts, if any are unconnected, + use the interrupts-extended property to specify the interrupts and set the + interrupt controller handle for unused interrupts to 0. - snps,nr-gpios : The number of pins in the port, a single cell. - resets : Reset line for the controller. diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt index 039219df05c5f69836a39bac48bac64ce33d476a..18a2cde2e5f3a96c89067ea5cafd60fe4c71a0bc 100644 --- a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt +++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt @@ -34,7 +34,7 @@ Optional properties: - mali-supply : Phandle to regulator for the Mali device. Refer to Documentation/devicetree/bindings/regulator/regulator.txt for details. -- operating-points-v2 : Refer to Documentation/devicetree/bindings/power/opp.txt +- operating-points-v2 : Refer to Documentation/devicetree/bindings/opp/opp.txt for details. diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-utgard.txt b/Documentation/devicetree/bindings/gpu/arm,mali-utgard.txt index c1f65d1dac1db5db178f8766e61eb8a22bf88f1f..63cd91176a688b5faf24b37dfd959dbeb1c7376d 100644 --- a/Documentation/devicetree/bindings/gpu/arm,mali-utgard.txt +++ b/Documentation/devicetree/bindings/gpu/arm,mali-utgard.txt @@ -44,7 +44,7 @@ Optional properties: - memory-region: Memory region to allocate from, as defined in - Documentation/devicetree/bindi/reserved-memory/reserved-memory.txt + Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt - mali-supply: Phandle to regulator for the Mali device, as defined in diff --git a/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.txt b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.txt new file mode 100644 index 0000000000000000000000000000000000000000..c907aa8dd755eb8083d14cdc8716b875c9820e99 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.txt @@ -0,0 +1,28 @@ +Broadcom V3D GPU + +Only the Broadcom V3D 3.x and newer GPUs are covered by this binding. +For V3D 2.x, see brcm,bcm-vc4.txt. + +Required properties: +- compatible: Should be "brcm,7268-v3d" or "brcm,7278-v3d" +- reg: Physical base addresses and lengths of the register areas +- reg-names: Names for the register areas. The "hub", "bridge", and "core0" + register areas are always required. The "gca" register area + is required if the GCA cache controller is present. +- interrupts: The interrupt numbers. The first interrupt is for the hub, + while the following interrupts are for the cores. + See bindings/interrupt-controller/interrupts.txt + +Optional properties: +- clocks: The core clock the unit runs on + +v3d { + compatible = "brcm,7268-v3d"; + reg = <0xf1204000 0x100>, + <0xf1200000 0x4000>, + <0xf1208000 0x4000>, + <0xf1204100 0x100>; + reg-names = "bridge", "hub", "core0", "gca"; + interrupts = <0 78 4>, + <0 77 4>; +}; diff --git a/Documentation/devicetree/bindings/gpu/samsung-scaler.txt b/Documentation/devicetree/bindings/gpu/samsung-scaler.txt new file mode 100644 index 0000000000000000000000000000000000000000..9c3d98105dfd802aba4a396fde9fe7184e140f7d --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/samsung-scaler.txt @@ -0,0 +1,27 @@ +* Samsung Exynos Image Scaler + +Required properties: + - compatible : value should be one of the following: + (a) "samsung,exynos5420-scaler" for Scaler IP in Exynos5420 + (b) "samsung,exynos5433-scaler" for Scaler IP in Exynos5433 + + - reg : Physical base address of the IP registers and length of memory + mapped region. + + - interrupts : Interrupt specifier for scaler interrupt, according to format + specific to interrupt parent. + + - clocks : Clock specifier for scaler clock, according to generic clock + bindings. (See Documentation/devicetree/bindings/clock/exynos*.txt) + + - clock-names : Names of clocks. For exynos scaler, it should be "mscl" + on 5420 and "pclk", "aclk" and "aclk_xiu" on 5433. + +Example: + scaler@12800000 { + compatible = "samsung,exynos5420-scaler"; + reg = <0x12800000 0x1294>; + interrupts = <0 220 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&clock CLK_MSCL0>; + clock-names = "mscl"; + }; diff --git a/Documentation/devicetree/bindings/hwmon/gpio-fan.txt b/Documentation/devicetree/bindings/hwmon/gpio-fan.txt index 439a7430fc6827e15b79811b8cfcc176fed205f7..2becdcfdc840c60e6d02ffe0f3893c00eb80321d 100644 --- a/Documentation/devicetree/bindings/hwmon/gpio-fan.txt +++ b/Documentation/devicetree/bindings/hwmon/gpio-fan.txt @@ -11,7 +11,7 @@ Optional properties: must have the RPM values in ascending order. - alarm-gpios: This pin going active indicates something is wrong with the fan, and a udev event will be fired. -- cooling-cells: If used as a cooling device, must be <2> +- #cooling-cells: If used as a cooling device, must be <2> Also see: Documentation/devicetree/bindings/thermal/thermal.txt min and max states are derived from the speed-map of the fan. diff --git a/Documentation/devicetree/bindings/hwmon/ltc2990.txt b/Documentation/devicetree/bindings/hwmon/ltc2990.txt new file mode 100644 index 0000000000000000000000000000000000000000..f92f54029e84c68f7a571a4b222d43e954e856dc --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/ltc2990.txt @@ -0,0 +1,36 @@ +ltc2990: Linear Technology LTC2990 power monitor + +Required properties: +- compatible: Must be "lltc,ltc2990" +- reg: I2C slave address +- lltc,meas-mode: + An array of two integers for configuring the chip measurement mode. + + The first integer defines the bits 2..0 in the control register. In all + cases the internal temperature and supply voltage are measured. In + addition the following input measurements are enabled per mode: + + 0: V1, V2, TR2 + 1: V1-V2, TR2 + 2: V1-V2, V3, V4 + 3: TR1, V3, V4 + 4: TR1, V3-V4 + 5: TR1, TR2 + 6: V1-V2, V3-V4 + 7: V1, V2, V3, V4 + + The second integer defines the bits 4..3 in the control register. This + allows a subset of the measurements to be enabled: + + 0: Internal temperature and supply voltage only + 1: TR1, V1 or V1-V2 only per mode + 2: TR2, V3 or V3-V4 only per mode + 3: All measurements per mode + +Example: + +ltc2990@4c { + compatible = "lltc,ltc2990"; + reg = <0x4c>; + lltc,meas-mode = <7 3>; /* V1, V2, V3, V4 */ +}; diff --git a/Documentation/devicetree/bindings/i2c/i2c-davinci.txt b/Documentation/devicetree/bindings/i2c/i2c-davinci.txt index 64e6e656c345c29c6b626100ffe74e47398959d1..b745f3706120f0605f43642af2750832832c75db 100644 --- a/Documentation/devicetree/bindings/i2c/i2c-davinci.txt +++ b/Documentation/devicetree/bindings/i2c/i2c-davinci.txt @@ -24,7 +24,7 @@ Recommended properties : - clock-frequency : desired I2C bus clock frequency in Hz. - ti,has-pfunc: boolean; if defined, it indicates that SoC supports PFUNC registers. PFUNC registers allow to switch I2C pins to function as - GPIOs, so they can by toggled manually. + GPIOs, so they can be toggled manually. Example (enbw_cmc board): i2c@1c22000 { diff --git a/Documentation/devicetree/bindings/i2c/i2c-rcar.txt b/Documentation/devicetree/bindings/i2c/i2c-rcar.txt index 4a7811ecd95447c8d1026b578f0bd557b65a07d7..7ce8fae55537046beb460436c025fb7bed2cddd2 100644 --- a/Documentation/devicetree/bindings/i2c/i2c-rcar.txt +++ b/Documentation/devicetree/bindings/i2c/i2c-rcar.txt @@ -15,6 +15,7 @@ Required properties: "renesas,i2c-r8a7796" if the device is a part of a R8A7796 SoC. "renesas,i2c-r8a77965" if the device is a part of a R8A77965 SoC. "renesas,i2c-r8a77970" if the device is a part of a R8A77970 SoC. + "renesas,i2c-r8a77980" if the device is a part of a R8A77980 SoC. "renesas,i2c-r8a77995" if the device is a part of a R8A77995 SoC. "renesas,rcar-gen1-i2c" for a generic R-Car Gen1 compatible device. "renesas,rcar-gen2-i2c" for a generic R-Car Gen2 or RZ/G1 compatible diff --git a/Documentation/devicetree/bindings/i2c/i2c-s3c2410.txt b/Documentation/devicetree/bindings/i2c/i2c-s3c2410.txt index 89b3250f049b6ead36b110df609a1ab340ac0fb8..66ae46d3bc2ff410b5be06a8f1fd288c306c6c4e 100644 --- a/Documentation/devicetree/bindings/i2c/i2c-s3c2410.txt +++ b/Documentation/devicetree/bindings/i2c/i2c-s3c2410.txt @@ -8,9 +8,7 @@ Required properties: (b) "samsung, s3c2440-i2c", for i2c compatible with s3c2440 i2c. (c) "samsung, s3c2440-hdmiphy-i2c", for s3c2440-like i2c used inside HDMIPHY block found on several samsung SoCs - (d) "samsung, exynos5440-i2c", for s3c2440-like i2c used - on EXYNOS5440 which does not need GPIO configuration. - (e) "samsung, exynos5-sata-phy-i2c", for s3c2440-like i2c used as + (d) "samsung, exynos5-sata-phy-i2c", for s3c2440-like i2c used as a host to SATA PHY controller on an internal bus. - reg: physical base address of the controller and length of memory mapped region. diff --git a/Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt b/Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt index 1e6ee3deb4fa214bfca84403d4da1c05df4f1121..d1acd5ea2737ff8b49b5043c34bb1ae38531b2ba 100644 --- a/Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt +++ b/Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt @@ -7,6 +7,7 @@ Required properties: - "amlogic,meson-gxbb-saradc" for GXBB - "amlogic,meson-gxl-saradc" for GXL - "amlogic,meson-gxm-saradc" for GXM + - "amlogic,meson-axg-saradc" for AXG along with the generic "amlogic,meson-saradc" - reg: the physical base address and length of the registers - interrupts: the interrupt indicating end of sampling diff --git a/Documentation/devicetree/bindings/iio/adc/mcp320x.txt b/Documentation/devicetree/bindings/iio/adc/mcp320x.txt index 7d64753df94930b0469ade79168af8c32b1d3370..56373d643f768abfac1380db28e8daa72a6bb071 100644 --- a/Documentation/devicetree/bindings/iio/adc/mcp320x.txt +++ b/Documentation/devicetree/bindings/iio/adc/mcp320x.txt @@ -49,7 +49,7 @@ Required properties: Examples: spi_controller { mcp3x0x@0 { - compatible = "mcp3002"; + compatible = "microchip,mcp3002"; reg = <0>; spi-max-frequency = <1000000>; vref-supply = <&vref_reg>; diff --git a/Documentation/devicetree/bindings/arm/samsung/exynos-adc.txt b/Documentation/devicetree/bindings/iio/adc/samsung,exynos-adc.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/samsung/exynos-adc.txt rename to Documentation/devicetree/bindings/iio/adc/samsung,exynos-adc.txt diff --git a/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt b/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt index e8bb8243e92c1c2a54090ab397e45f8bfbd55e5f..f1ead43a1a957783c554546ddfb23adcf0dba29f 100644 --- a/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt +++ b/Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt @@ -24,8 +24,11 @@ Required properties: - compatible: Should be one of: "st,stm32f4-adc-core" "st,stm32h7-adc-core" + "st,stm32mp1-adc-core" - reg: Offset and length of the ADC block register set. -- interrupts: Must contain the interrupt for ADC block. +- interrupts: One or more interrupts for ADC block. Some parts like stm32f4 + and stm32h7 share a common ADC interrupt line. stm32mp1 has two separate + interrupt lines, one for each ADC within ADC block. - clocks: Core can use up to two clocks, depending on part used: - "adc" clock: for the analog circuitry, common to all ADCs. It's required on stm32f4. @@ -53,6 +56,7 @@ Required properties: - compatible: Should be one of: "st,stm32f4-adc" "st,stm32h7-adc" + "st,stm32mp1-adc" - reg: Offset of ADC instance in ADC block (e.g. may be 0x0, 0x100, 0x200). - clocks: Input clock private to this ADC instance. It's required only on stm32f4, that has per instance clock input for registers access. diff --git a/Documentation/devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.txt b/Documentation/devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.txt index ed7520d1d0518d96be3d543cb5da59f7dd40a736..75ba25d062e1833fd28a777a8ad2991c9b1d9efa 100644 --- a/Documentation/devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.txt +++ b/Documentation/devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.txt @@ -8,14 +8,16 @@ It is mainly targeted for: - PDM microphones (audio digital microphone) It features up to 8 serial digital interfaces (SPI or Manchester) and -up to 4 filters on stm32h7. +up to 4 filters on stm32h7 or 6 filters on stm32mp1. Each child node match with a filter instance. Contents of a STM32 DFSDM root node: ------------------------------------ Required properties: -- compatible: Should be "st,stm32h7-dfsdm". +- compatible: Should be one of: + "st,stm32h7-dfsdm" + "st,stm32mp1-dfsdm" - reg: Offset and length of the DFSDM block register set. - clocks: IP and serial interfaces clocking. Should be set according to rcc clock ID and "clock-names". @@ -45,6 +47,7 @@ Required properties: "st,stm32-dfsdm-adc" for sigma delta ADCs "st,stm32-dfsdm-dmic" for audio digital microphone. - reg: Specifies the DFSDM filter instance used. + Valid values are from 0 to 3 on stm32h7, 0 to 5 on stm32mp1. - interrupts: IRQ lines connected to each DFSDM filter instance. - st,adc-channels: List of single-ended channels muxed for this ADC. valid values: diff --git a/Documentation/devicetree/bindings/iio/afe/current-sense-amplifier.txt b/Documentation/devicetree/bindings/iio/afe/current-sense-amplifier.txt new file mode 100644 index 0000000000000000000000000000000000000000..821b61b8c542ff925e192b175efc5be0516de962 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/afe/current-sense-amplifier.txt @@ -0,0 +1,26 @@ +Current Sense Amplifier +======================= + +When an io-channel measures the output voltage from a current sense +amplifier, the interesting measurement is almost always the current +through the sense resistor, not the voltage output. This binding +describes such a current sense circuit. + +Required properties: +- compatible : "current-sense-amplifier" +- io-channels : Channel node of a voltage io-channel. +- sense-resistor-micro-ohms : The sense resistance in microohms. + +Optional properties: +- sense-gain-mult: Amplifier gain multiplier. The default is <1>. +- sense-gain-div: Amplifier gain divider. The default is <1>. + +Example: + +sysi { + compatible = "current-sense-amplifier"; + io-channels = <&tiadc 0>; + + sense-resistor-micro-ohms = <20000>; + sense-gain-mul = <50>; +}; diff --git a/Documentation/devicetree/bindings/iio/afe/current-sense-shunt.txt b/Documentation/devicetree/bindings/iio/afe/current-sense-shunt.txt new file mode 100644 index 0000000000000000000000000000000000000000..0f67108a07b6327bfa0f1de99b8781715a0c2734 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/afe/current-sense-shunt.txt @@ -0,0 +1,41 @@ +Current Sense Shunt +=================== + +When an io-channel measures the voltage over a current sense shunt, +the interesting measurement is almost always the current through the +shunt, not the voltage over it. This binding describes such a current +sense circuit. + +Required properties: +- compatible : "current-sense-shunt" +- io-channels : Channel node of a voltage io-channel. +- shunt-resistor-micro-ohms : The shunt resistance in microohms. + +Example: +The system current is measured by measuring the voltage over a +3.3 ohms shunt resistor. + +sysi { + compatible = "current-sense-shunt"; + io-channels = <&tiadc 0>; + + /* Divide the voltage by 3300000/1000000 (or 3.3) for the current. */ + shunt-resistor-micro-ohms = <3300000>; +}; + +&i2c { + tiadc: adc@48 { + compatible = "ti,ads1015"; + reg = <0x48>; + #io-channel-cells = <1>; + + #address-cells = <1>; + #size-cells = <0>; + + channel@0 { /* IN0,IN1 differential */ + reg = <0>; + ti,gain = <1>; + ti,datarate = <4>; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/iio/afe/voltage-divider.txt b/Documentation/devicetree/bindings/iio/afe/voltage-divider.txt new file mode 100644 index 0000000000000000000000000000000000000000..b452a8406107625e9cc8bf6be1eb56ba80bd7240 --- /dev/null +++ b/Documentation/devicetree/bindings/iio/afe/voltage-divider.txt @@ -0,0 +1,53 @@ +Voltage divider +=============== + +When an io-channel measures the midpoint of a voltage divider, the +interesting voltage is often the voltage over the full resistance +of the divider. This binding describes the voltage divider in such +a curcuit. + + Vin ----. + | + .-----. + | R | + '-----' + | + +---- Vout + | + .-----. + | Rout| + '-----' + | + GND + +Required properties: +- compatible : "voltage-divider" +- io-channels : Channel node of a voltage io-channel measuring Vout. +- output-ohms : Resistance Rout over which the output voltage is measured. + See full-ohms. +- full-ohms : Resistance R + Rout for the full divider. The io-channel + is scaled by the Rout / (R + Rout) quotient. + +Example: +The system voltage is circa 12V, but divided down with a 22/222 +voltage divider (R = 200 Ohms, Rout = 22 Ohms) and fed to an ADC. + +sysv { + compatible = "voltage-divider"; + io-channels = <&maxadc 1>; + + /* Scale the system voltage by 22/222 to fit the ADC range. */ + output-ohms = <22>; + full-ohms = <222>; /* 200 + 22 */ +}; + +&spi { + maxadc: adc@0 { + compatible = "maxim,max1027"; + reg = <0>; + #io-channel-cells = <1>; + interrupt-parent = <&gpio5>; + interrupts = <15 IRQ_TYPE_EDGE_RISING>; + spi-max-frequency = <1000000>; + }; +}; diff --git a/Documentation/devicetree/bindings/iio/dac/ltc2632.txt b/Documentation/devicetree/bindings/iio/dac/ltc2632.txt index eb911e5a8ab443dc6364713f20de0f0fca5ae2b1..e0d5fea33031d3614676b45be596a9dc6c3e79bf 100644 --- a/Documentation/devicetree/bindings/iio/dac/ltc2632.txt +++ b/Documentation/devicetree/bindings/iio/dac/ltc2632.txt @@ -12,12 +12,26 @@ Required properties: Property rules described in Documentation/devicetree/bindings/spi/spi-bus.txt apply. In particular, "reg" and "spi-max-frequency" properties must be given. +Optional properties: + - vref-supply: Phandle to the external reference voltage supply. This should + only be set if there is an external reference voltage connected to the VREF + pin. If the property is not set the internal reference is used. + Example: + vref: regulator-vref { + compatible = "regulator-fixed"; + regulator-name = "vref-ltc2632"; + regulator-min-microvolt = <1250000>; + regulator-max-microvolt = <1250000>; + regulator-always-on; + }; + spi_master { dac: ltc2632@0 { compatible = "lltc,ltc2632-l12"; reg = <0>; /* CS0 */ spi-max-frequency = <1000000>; + vref-supply = <&vref>; /* optional */ }; }; diff --git a/Documentation/devicetree/bindings/iio/dac/ti,dac5571.txt b/Documentation/devicetree/bindings/iio/dac/ti,dac5571.txt new file mode 100644 index 0000000000000000000000000000000000000000..03af6b9a4d07c1434eb45e34eb1b5de19df8231c --- /dev/null +++ b/Documentation/devicetree/bindings/iio/dac/ti,dac5571.txt @@ -0,0 +1,24 @@ +* Texas Instruments DAC5571 Family + +Required properties: + - compatible: Should contain + "ti,dac5571" + "ti,dac6571" + "ti,dac7571" + "ti,dac5574" + "ti,dac6574" + "ti,dac7574" + "ti,dac5573" + "ti,dac6573" + "ti,dac7573" + - reg: Should contain the DAC I2C address + +Optional properties: + - vref-supply: The regulator supply for DAC reference voltage + +Example: +dac@0 { + compatible = "ti,dac5571"; + reg = <0x4C>; + vref-supply = <&vdd_supply>; +}; diff --git a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt index 2b4514592f833c9be71e0522ca8e839082470030..5f4777e8cc9e510e4a74c7512d7e6a228a0cd6af 100644 --- a/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt +++ b/Documentation/devicetree/bindings/iio/imu/inv_mpu6050.txt @@ -8,10 +8,16 @@ Required properties: "invensense,mpu6500" "invensense,mpu9150" "invensense,mpu9250" + "invensense,mpu9255" "invensense,icm20608" - reg : the I2C address of the sensor - interrupt-parent : should be the phandle for the interrupt controller - - interrupts : interrupt mapping for GPIO IRQ + - interrupts: interrupt mapping for IRQ. It should be configured with flags + IRQ_TYPE_LEVEL_HIGH, IRQ_TYPE_EDGE_RISING, IRQ_TYPE_LEVEL_LOW or + IRQ_TYPE_EDGE_FALLING. + + Refer to interrupt-controller/interrupts.txt for generic interrupt client node + bindings. Optional properties: - mount-matrix: an optional 3x3 mounting rotation matrix @@ -24,7 +30,7 @@ Example: compatible = "invensense,mpu6050"; reg = <0x68>; interrupt-parent = <&gpio1>; - interrupts = <18 1>; + interrupts = <18 IRQ_TYPE_EDGE_RISING>; mount-matrix = "-0.984807753012208", /* x0 */ "0", /* y0 */ "-0.173648177666930", /* z0 */ @@ -41,7 +47,7 @@ Example: compatible = "invensense,mpu9250"; reg = <0x68>; interrupt-parent = <&gpio3>; - interrupts = <21 1>; + interrupts = <21 IRQ_TYPE_LEVEL_HIGH>; i2c-gate { #address-cells = <1>; #size-cells = <0>; diff --git a/Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt b/Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt index 1ff1af799c76f3dcad89ee065bbe0f423369f6d7..ef8a8566c63fc24d1a9f1bf8b74fb2bdd7df094d 100644 --- a/Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt +++ b/Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt @@ -6,6 +6,7 @@ Required properties: "st,lsm6ds3h" "st,lsm6dsl" "st,lsm6dsm" + "st,ism330dlc" - reg: i2c address of the sensor / spi cs line Optional properties: diff --git a/Documentation/devicetree/bindings/iio/potentiostat/lmp91000.txt b/Documentation/devicetree/bindings/iio/potentiostat/lmp91000.txt index b9b621e94cd7d0694bbc24a42d1d2a179f37f22e..e6d0c2eb345c4cbf4fca7022549325ffa0a727d5 100644 --- a/Documentation/devicetree/bindings/iio/potentiostat/lmp91000.txt +++ b/Documentation/devicetree/bindings/iio/potentiostat/lmp91000.txt @@ -1,10 +1,13 @@ -* Texas Instruments LMP91000 potentiostat +* Texas Instruments LMP91000 series of potentiostats -http://www.ti.com/lit/ds/symlink/lmp91000.pdf +LMP91000: http://www.ti.com/lit/ds/symlink/lmp91000.pdf +LMP91002: http://www.ti.com/lit/ds/symlink/lmp91002.pdf Required properties: - - compatible: should be "ti,lmp91000" + - compatible: should be one of the following: + "ti,lmp91000" + "ti,lmp91002" - reg: the I2C address of the device - io-channels: the phandle of the iio provider diff --git a/Documentation/devicetree/bindings/input/elan_i2c.txt b/Documentation/devicetree/bindings/input/elan_i2c.txt index ee3242c4ba67a43e67b5b3295637523ecd8bef9c..d80a83583238a16ad0fe493f560028b7df83c605 100644 --- a/Documentation/devicetree/bindings/input/elan_i2c.txt +++ b/Documentation/devicetree/bindings/input/elan_i2c.txt @@ -14,6 +14,7 @@ Optional properties: - pinctrl-0: a phandle pointing to the pin settings for the device (see pinctrl binding [1]). - vcc-supply: a phandle for the regulator supplying 3.3V power. +- elan,trackpoint: touchpad can support a trackpoint (boolean) [0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt [1]: Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt diff --git a/Documentation/devicetree/bindings/input/mtk-pmic-keys.txt b/Documentation/devicetree/bindings/input/mtk-pmic-keys.txt new file mode 100644 index 0000000000000000000000000000000000000000..2888d07c2ef0629e9a092619b7f1f9dc04d2d890 --- /dev/null +++ b/Documentation/devicetree/bindings/input/mtk-pmic-keys.txt @@ -0,0 +1,43 @@ +MediaTek MT6397/MT6323 PMIC Keys Device Driver + +There are two key functions provided by MT6397/MT6323 PMIC, pwrkey +and homekey. The key functions are defined as the subnode of the function +node provided by MT6397/MT6323 PMIC that is being defined as one kind +of Muti-Function Device (MFD) + +For MT6397/MT6323 MFD bindings see: +Documentation/devicetree/bindings/mfd/mt6397.txt + +Required properties: +- compatible: "mediatek,mt6397-keys" or "mediatek,mt6323-keys" +- linux,keycodes: See Documentation/devicetree/bindings/input/keys.txt + +Optional Properties: +- wakeup-source: See Documentation/devicetree/bindings/power/wakeup-source.txt +- mediatek,long-press-mode: Long press key shutdown setting, 1 for + pwrkey only, 2 for pwrkey/homekey together, others for disabled. +- power-off-time-sec: See Documentation/devicetree/bindings/input/keys.txt + +Example: + + pmic: mt6397 { + compatible = "mediatek,mt6397"; + + ... + + mt6397keys: mt6397keys { + compatible = "mediatek,mt6397-keys"; + mediatek,long-press-mode = <1>; + power-off-time-sec = <0>; + + power { + linux,keycodes = <116>; + wakeup-source; + }; + + home { + linux,keycodes = <114>; + }; + }; + + }; diff --git a/Documentation/devicetree/bindings/input/rmi4/rmi_2d_sensor.txt b/Documentation/devicetree/bindings/input/rmi4/rmi_2d_sensor.txt index f2c30c8b725dff3f6ba8c2b55b4f4e2e416e97a7..9afffbdf6e285bc3c57673e86b4aac3e805aa855 100644 --- a/Documentation/devicetree/bindings/input/rmi4/rmi_2d_sensor.txt +++ b/Documentation/devicetree/bindings/input/rmi4/rmi_2d_sensor.txt @@ -12,7 +12,7 @@ Additional documentation for F11 can be found at: http://www.synaptics.com/sites/default/files/511-000136-01-Rev-E-RMI4-Interfacing-Guide.pdf Optional Touch Properties: -Description in Documentation/devicetree/bindings/input/touch +Description in Documentation/devicetree/bindings/input/touchscreen - touchscreen-inverted-x - touchscreen-inverted-y - touchscreen-swapped-x-y diff --git a/Documentation/devicetree/bindings/input/rotary-encoder.txt b/Documentation/devicetree/bindings/input/rotary-encoder.txt index f99fe5cdeaec6b6c4d841bfa49603e84495a1b16..a644408b33b8f17d1d00be1177a259e8780a9c97 100644 --- a/Documentation/devicetree/bindings/input/rotary-encoder.txt +++ b/Documentation/devicetree/bindings/input/rotary-encoder.txt @@ -28,7 +28,7 @@ Deprecated properties: This property is deprecated. Instead, a 'steps-per-period ' value should be used, such as "rotary-encoder,steps-per-period = <2>". -See Documentation/input/rotary-encoder.txt for more information. +See Documentation/input/devices/rotary-encoder.rst for more information. Example: diff --git a/Documentation/devicetree/bindings/input/sprd,sc27xx-vibra.txt b/Documentation/devicetree/bindings/input/sprd,sc27xx-vibra.txt new file mode 100644 index 0000000000000000000000000000000000000000..f2ec0d4f2dff47692bd840f4b6a551345af8455a --- /dev/null +++ b/Documentation/devicetree/bindings/input/sprd,sc27xx-vibra.txt @@ -0,0 +1,23 @@ +Spreadtrum SC27xx PMIC Vibrator + +Required properties: +- compatible: should be "sprd,sc2731-vibrator". +- reg: address of vibrator control register. + +Example : + + sc2731_pmic: pmic@0 { + compatible = "sprd,sc2731"; + reg = <0>; + spi-max-frequency = <26000000>; + interrupts = ; + interrupt-controller; + #interrupt-cells = <2>; + #address-cells = <1>; + #size-cells = <0>; + + vibrator@eb4 { + compatible = "sprd,sc2731-vibrator"; + reg = <0xeb4>; + }; + }; diff --git a/Documentation/devicetree/bindings/input/touchscreen/hideep.txt b/Documentation/devicetree/bindings/input/touchscreen/hideep.txt index 121d9b7c79a24cd05e6452bb8b52a14d3d20f46a..1063c30d53f7d0fd7b642d323d32799ba4fb51fe 100644 --- a/Documentation/devicetree/bindings/input/touchscreen/hideep.txt +++ b/Documentation/devicetree/bindings/input/touchscreen/hideep.txt @@ -32,7 +32,7 @@ i2c@00000000 { reg = <0x6c>; interrupt-parent = <&gpx1>; interrupts = <2 IRQ_TYPE_LEVEL_LOW>; - vdd-supply = <&ldo15_reg>"; + vdd-supply = <&ldo15_reg>; vid-supply = <&ldo18_reg>; reset-gpios = <&gpx1 5 0>; touchscreen-size-x = <1080>; diff --git a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt index a83f9a5734cab87aa57ea78b36e240eff6d52d80..89674ad8a097add3b9a6d4c1429435ca2d6a75dc 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.txt @@ -9,11 +9,12 @@ number of interrupt exposed depends on the SoC. Required properties: -- compatible : must have "amlogic,meson8-gpio-intc” and either - “amlogic,meson8-gpio-intc” for meson8 SoCs (S802) or - “amlogic,meson8b-gpio-intc” for meson8b SoCs (S805) or - “amlogic,meson-gxbb-gpio-intc” for GXBB SoCs (S905) or - “amlogic,meson-gxl-gpio-intc” for GXL SoCs (S905X, S912) +- compatible : must have "amlogic,meson8-gpio-intc" and either + "amlogic,meson8-gpio-intc" for meson8 SoCs (S802) or + "amlogic,meson8b-gpio-intc" for meson8b SoCs (S805) or + "amlogic,meson-gxbb-gpio-intc" for GXBB SoCs (S905) or + "amlogic,meson-gxl-gpio-intc" for GXL SoCs (S905X, S912) + "amlogic,meson-axg-gpio-intc" for AXG SoCs (A113D, A113X) - interrupt-parent : a phandle to the GIC the interrupts are routed to. Usually this is provided at the root level of the device tree as it is common to most of the SoC. diff --git a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt index 0a57f2f4167de1994bf5254ad99503fe22e3e224..3ea78c4ef887c9be43a48a287b9cabc77c1a2db4 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt @@ -57,6 +57,20 @@ Optional occupied by the redistributors. Required if more than one such region is present. +- msi-controller: Boolean property. Identifies the node as an MSI + controller. Only present if the Message Based Interrupt + functionnality is being exposed by the HW, and the mbi-ranges + property present. + +- mbi-ranges: A list of pairs , where "intid" is the first + SPI of a range that can be used an MBI, and "span" the size of that + range. Multiple ranges can be provided. Requires "msi-controller" to + be set. + +- mbi-alias: Address property. Base address of an alias of the GICD + region containing only the {SET,CLR}SPI registers to be used if + isolation is required, and if supported by the HW. + Sub-nodes: PPI affinity can be expressed as a single "ppi-partitions" node, @@ -99,6 +113,9 @@ Examples: <0x0 0x2c020000 0 0x2000>; // GICV interrupts = <1 9 4>; + msi-controller; + mbi-ranges = <256 128>; + gic-its@2c200000 { compatible = "arm,gic-v3-its"; msi-controller; diff --git a/Documentation/devicetree/bindings/interrupt-controller/nvidia,tegra20-ictlr.txt b/Documentation/devicetree/bindings/interrupt-controller/nvidia,tegra20-ictlr.txt index 1099fe0788fae19c27dd1153e6d9d9e4aba10c6f..f246ccbf8838c2c90496572af8aa4e4d17079be1 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/nvidia,tegra20-ictlr.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/nvidia,tegra20-ictlr.txt @@ -15,7 +15,7 @@ Required properties: include "nvidia,tegra30-ictlr". - reg : Specifies base physical address and size of the registers. Each controller must be described separately (Tegra20 has 4 of them, - whereas Tegra30 and later have 5" + whereas Tegra30 and later have 5). - interrupt-controller : Identifies the node as an interrupt controller. - #interrupt-cells : Specifies the number of cells needed to encode an interrupt source. The value must be 3. diff --git a/Documentation/devicetree/bindings/interrupt-controller/st,stm32-exti.txt b/Documentation/devicetree/bindings/interrupt-controller/st,stm32-exti.txt index edf03f09244b352565b6a15d0d8cde162cf2599f..6a36bf66d932d42320cfc6b3c998b16488609d61 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/st,stm32-exti.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/st,stm32-exti.txt @@ -5,11 +5,14 @@ Required properties: - compatible: Should be: "st,stm32-exti" "st,stm32h7-exti" + "st,stm32mp1-exti" - reg: Specifies base physical address and size of the registers - interrupt-controller: Indentifies the node as an interrupt controller - #interrupt-cells: Specifies the number of cells to encode an interrupt specifier, shall be 2 - interrupts: interrupts references to primary interrupt controller + (only needed for exti controller with multiple exti under + same parent interrupt: st,stm32-exti and st,stm32h7-exti) Example: diff --git a/Documentation/devicetree/bindings/ipmi/npcm7xx-kcs-bmc.txt b/Documentation/devicetree/bindings/ipmi/npcm7xx-kcs-bmc.txt new file mode 100644 index 0000000000000000000000000000000000000000..3538a214fff156d461322f7e10b92821f9331484 --- /dev/null +++ b/Documentation/devicetree/bindings/ipmi/npcm7xx-kcs-bmc.txt @@ -0,0 +1,39 @@ +* Nuvoton NPCM7xx KCS (Keyboard Controller Style) IPMI interface + +The Nuvoton SOCs (NPCM7xx) are commonly used as BMCs +(Baseboard Management Controllers) and the KCS interface can be +used to perform in-band IPMI communication with their host. + +Required properties: +- compatible : should be one of + "nuvoton,npcm750-kcs-bmc" +- interrupts : interrupt generated by the controller +- kcs_chan : The KCS channel number in the controller + +Example: + + lpc_kcs: lpc_kcs@f0007000 { + compatible = "nuvoton,npcm750-lpc-kcs", "simple-mfd", "syscon"; + reg = <0xf0007000 0x40>; + reg-io-width = <1>; + + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0xf0007000 0x40>; + + kcs1: kcs1@0 { + compatible = "nuvoton,npcm750-kcs-bmc"; + reg = <0x0 0x40>; + interrupts = <0 9 4>; + kcs_chan = <1>; + status = "disabled"; + }; + + kcs2: kcs2@0 { + compatible = "nuvoton,npcm750-kcs-bmc"; + reg = <0x0 0x40>; + interrupts = <0 9 4>; + kcs_chan = <2>; + status = "disabled"; + }; + }; \ No newline at end of file diff --git a/Documentation/devicetree/bindings/leds/backlight/pwm-backlight.txt b/Documentation/devicetree/bindings/leds/backlight/pwm-backlight.txt index 764db86d441ad131aceca0e15146d5c914d97919..3108109066132698eaaff2606cd4a652d708a202 100644 --- a/Documentation/devicetree/bindings/leds/backlight/pwm-backlight.txt +++ b/Documentation/devicetree/bindings/leds/backlight/pwm-backlight.txt @@ -17,6 +17,10 @@ Optional properties: "pwms" property (see PWM binding[0]) - enable-gpios: contains a single GPIO specifier for the GPIO which enables and disables the backlight (see GPIO binding[1]) + - post-pwm-on-delay-ms: Delay in ms between setting an initial (non-zero) PWM + and enabling the backlight using GPIO. + - pwm-off-delay-ms: Delay in ms between disabling the backlight using GPIO + and setting PWM value to 0. [0]: Documentation/devicetree/bindings/pwm/pwm.txt [1]: Documentation/devicetree/bindings/gpio/gpio.txt @@ -32,4 +36,6 @@ Example: power-supply = <&vdd_bl_reg>; enable-gpios = <&gpio 58 0>; + post-pwm-on-delay-ms = <10>; + pwm-off-delay-ms = <10>; }; diff --git a/Documentation/devicetree/bindings/leds/backlight/zii,rave-sp-backlight.txt b/Documentation/devicetree/bindings/leds/backlight/zii,rave-sp-backlight.txt new file mode 100644 index 0000000000000000000000000000000000000000..ff5c921386502f296a88be04abf66a1e30ac5d92 --- /dev/null +++ b/Documentation/devicetree/bindings/leds/backlight/zii,rave-sp-backlight.txt @@ -0,0 +1,23 @@ +Zodiac Inflight Innovations RAVE Supervisory Processor Backlight Bindings + +RAVE SP backlight device is a "MFD cell" device corresponding to +backlight functionality of RAVE Supervisory Processor. It is expected +that its Device Tree node is specified as a child of the node +corresponding to the parent RAVE SP device (as documented in +Documentation/devicetree/bindings/mfd/zii,rave-sp.txt) + +Required properties: + +- compatible: Should be "zii,rave-sp-backlight" + +Example: + + rave-sp { + compatible = "zii,rave-sp-rdu1"; + current-speed = <38400>; + + backlight { + compatible = "zii,rave-sp-backlight"; + }; + } + diff --git a/Documentation/devicetree/bindings/leds/leds-cr0014114.txt b/Documentation/devicetree/bindings/leds/leds-cr0014114.txt new file mode 100644 index 0000000000000000000000000000000000000000..4255b19ad25cf597efdd99bf3591810d4432d707 --- /dev/null +++ b/Documentation/devicetree/bindings/leds/leds-cr0014114.txt @@ -0,0 +1,54 @@ +Crane Merchandising System - cr0014114 LED driver +------------------------------------------------- + +This LED Board is widely used in vending machines produced +by Crane Merchandising Systems. + +Required properties: +- compatible: "crane,cr0014114" + +Property rules described in Documentation/devicetree/bindings/spi/spi-bus.txt +apply. In particular, "reg" and "spi-max-frequency" properties must be given. + +LED sub-node properties: +- label : + see Documentation/devicetree/bindings/leds/common.txt +- linux,default-trigger : (optional) + see Documentation/devicetree/bindings/leds/common.txt + +Example +------- + +led-controller@0 { + compatible = "crane,cr0014114"; + reg = <0>; + spi-max-frequency = <50000>; + #address-cells = <1>; + #size-cells = <0>; + + led@0 { + reg = <0>; + label = "red:coin"; + }; + led@1 { + reg = <1>; + label = "green:coin"; + }; + led@2 { + reg = <2>; + label = "blue:coin"; + }; + led@3 { + reg = <3>; + label = "red:bill"; + }; + led@4 { + reg = <4>; + label = "green:bill"; + }; + led@5 { + reg = <5>; + label = "blue:bill"; + }; + ... +}; diff --git a/Documentation/devicetree/bindings/leds/leds-lm3601x.txt b/Documentation/devicetree/bindings/leds/leds-lm3601x.txt new file mode 100644 index 0000000000000000000000000000000000000000..a88b2c41e75b7f81d5a7a8e95261d0a5f907b03b --- /dev/null +++ b/Documentation/devicetree/bindings/leds/leds-lm3601x.txt @@ -0,0 +1,45 @@ +* Texas Instruments - lm3601x Single-LED Flash Driver + +The LM3601X are ultra-small LED flash drivers that +provide a high level of adjustability. + +Required properties: + - compatible : Can be one of the following + "ti,lm36010" + "ti,lm36011" + - reg : I2C slave address + - #address-cells : 1 + - #size-cells : 0 + +Required child properties: + - reg : 0 - Indicates a IR mode + 1 - Indicates a Torch (white LED) mode + +Required properties for flash LED child nodes: + See Documentation/devicetree/bindings/leds/common.txt + - flash-max-microamp : Range from 11mA - 1.5A + - flash-max-timeout-us : Range from 40ms - 1600ms + - led-max-microamp : Range from 2.4mA - 376mA + +Optional child properties: + - label : see Documentation/devicetree/bindings/leds/common.txt + +Example: +led-controller@64 { + compatible = "ti,lm36010"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x64>; + + led@0 { + reg = <1>; + label = "white:torch"; + led-max-microamp = <376000>; + flash-max-microamp = <1500000>; + flash-max-timeout-us = <1600000>; + }; +} + +For more product information please see the links below: +http://www.ti.com/product/LM36010 +http://www.ti.com/product/LM36011 diff --git a/Documentation/devicetree/bindings/leds/leds-sc27xx-bltc.txt b/Documentation/devicetree/bindings/leds/leds-sc27xx-bltc.txt new file mode 100644 index 0000000000000000000000000000000000000000..dddf84f9c7ea5d1cad32715731e5ddf5640b6ad0 --- /dev/null +++ b/Documentation/devicetree/bindings/leds/leds-sc27xx-bltc.txt @@ -0,0 +1,41 @@ +LEDs connected to Spreadtrum SC27XX PMIC breathing light controller + +The SC27xx breathing light controller supports to 3 outputs: +red LED, green LED and blue LED. Each LED can work at normal +PWM mode or breath light mode. + +Required properties: +- compatible: Should be "sprd,sc2731-bltc". +- #address-cells: Must be 1. +- #size-cells: Must be 0. +- reg: Specify the controller address. + +Required child properties: +- reg: Port this LED is connected to. + +Optional child properties: +- label: See Documentation/devicetree/bindings/leds/common.txt. + +Examples: + +led-controller@200 { + compatible = "sprd,sc2731-bltc"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x200>; + + led@0 { + label = "red"; + reg = <0x0>; + }; + + led@1 { + label = "green"; + reg = <0x1>; + }; + + led@2 { + label = "blue"; + reg = <0x2>; + }; +}; diff --git a/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.txt b/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.txt index 16964f0c17733febf55cce4b125ec2a983660fa6..6e8a9ab0fdaebb9919168b2923c45c88cdc8c197 100644 --- a/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.txt +++ b/Documentation/devicetree/bindings/mailbox/qcom,apcs-kpss-global.txt @@ -10,6 +10,8 @@ platforms. Definition: must be one of: "qcom,msm8916-apcs-kpss-global", "qcom,msm8996-apcs-hmss-global" + "qcom,msm8998-apcs-hmss-global" + "qcom,sdm845-apss-shared" - reg: Usage: required diff --git a/Documentation/devicetree/bindings/mailbox/stm32-ipcc.txt b/Documentation/devicetree/bindings/mailbox/stm32-ipcc.txt new file mode 100644 index 0000000000000000000000000000000000000000..1d2b7fee7b853e52bf5b6d2e2c718e96e3f85456 --- /dev/null +++ b/Documentation/devicetree/bindings/mailbox/stm32-ipcc.txt @@ -0,0 +1,47 @@ +* STMicroelectronics STM32 IPCC (Inter-Processor Communication Controller) + +The IPCC block provides a non blocking signaling mechanism to post and +retrieve messages in an atomic way between two processors. +It provides the signaling for N bidirectionnal channels. The number of channels +(N) can be read from a dedicated register. + +Required properties: +- compatible: Must be "st,stm32mp1-ipcc" +- reg: Register address range (base address and length) +- st,proc-id: Processor id using the mailbox (0 or 1) +- clocks: Input clock +- interrupt-names: List of names for the interrupts described by the interrupt + property. Must contain the following entries: + - "rx" + - "tx" + - "wakeup" +- interrupts: Interrupt specifiers for "rx channel occupied", "tx channel + free" and "system wakeup". +- #mbox-cells: Number of cells required for the mailbox specifier. Must be 1. + The data contained in the mbox specifier of the "mboxes" + property in the client node is the mailbox channel index. + +Optional properties: +- wakeup-source: Flag to indicate whether this device can wake up the system + + + +Example: + ipcc: mailbox@4c001000 { + compatible = "st,stm32mp1-ipcc"; + #mbox-cells = <1>; + reg = <0x4c001000 0x400>; + st,proc-id = <0>; + interrupts-extended = <&intc GIC_SPI 100 IRQ_TYPE_NONE>, + <&intc GIC_SPI 101 IRQ_TYPE_NONE>, + <&aiec 62 1>; + interrupt-names = "rx", "tx", "wakeup"; + clocks = <&rcc_clk IPCC>; + wakeup-source; + } + +Client: + mbox_test { + ... + mboxes = <&ipcc 0>, <&ipcc 1>; + }; diff --git a/Documentation/devicetree/bindings/marvell.txt b/Documentation/devicetree/bindings/marvell.txt deleted file mode 100644 index 7f722316458a3eae58969dd959d37bb4d2aadb75..0000000000000000000000000000000000000000 --- a/Documentation/devicetree/bindings/marvell.txt +++ /dev/null @@ -1,516 +0,0 @@ -Marvell Discovery mv64[345]6x System Controller chips -=========================================================== - -The Marvell mv64[345]60 series of system controller chips contain -many of the peripherals needed to implement a complete computer -system. In this section, we define device tree nodes to describe -the system controller chip itself and each of the peripherals -which it contains. Compatible string values for each node are -prefixed with the string "marvell,", for Marvell Technology Group Ltd. - -1) The /system-controller node - - This node is used to represent the system-controller and must be - present when the system uses a system controller chip. The top-level - system-controller node contains information that is global to all - devices within the system controller chip. The node name begins - with "system-controller" followed by the unit address, which is - the base address of the memory-mapped register set for the system - controller chip. - - Required properties: - - - ranges : Describes the translation of system controller addresses - for memory mapped registers. - - clock-frequency: Contains the main clock frequency for the system - controller chip. - - reg : This property defines the address and size of the - memory-mapped registers contained within the system controller - chip. The address specified in the "reg" property should match - the unit address of the system-controller node. - - #address-cells : Address representation for system controller - devices. This field represents the number of cells needed to - represent the address of the memory-mapped registers of devices - within the system controller chip. - - #size-cells : Size representation for the memory-mapped - registers within the system controller chip. - - #interrupt-cells : Defines the width of cells used to represent - interrupts. - - Optional properties: - - - model : The specific model of the system controller chip. Such - as, "mv64360", "mv64460", or "mv64560". - - compatible : A string identifying the compatibility identifiers - of the system controller chip. - - The system-controller node contains child nodes for each system - controller device that the platform uses. Nodes should not be created - for devices which exist on the system controller chip but are not used - - Example Marvell Discovery mv64360 system-controller node: - - system-controller@f1000000 { /* Marvell Discovery mv64360 */ - #address-cells = <1>; - #size-cells = <1>; - model = "mv64360"; /* Default */ - compatible = "marvell,mv64360"; - clock-frequency = <133333333>; - reg = <0xf1000000 0x10000>; - virtual-reg = <0xf1000000>; - ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */ - 0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */ - 0xa0000000 0xa0000000 0x4000000 /* User FLASH */ - 0x00000000 0xf1000000 0x0010000 /* Bridge's regs */ - 0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */ - - [ child node definitions... ] - } - -2) Child nodes of /system-controller - - a) Marvell Discovery MDIO bus - - The MDIO is a bus to which the PHY devices are connected. For each - device that exists on this bus, a child node should be created. See - the definition of the PHY node below for an example of how to define - a PHY. - - Required properties: - - #address-cells : Should be <1> - - #size-cells : Should be <0> - - compatible : Should be "marvell,mv64360-mdio" - - Example: - - mdio { - #address-cells = <1>; - #size-cells = <0>; - compatible = "marvell,mv64360-mdio"; - - ethernet-phy@0 { - ...... - }; - }; - - - b) Marvell Discovery ethernet controller - - The Discover ethernet controller is described with two levels - of nodes. The first level describes an ethernet silicon block - and the second level describes up to 3 ethernet nodes within - that block. The reason for the multiple levels is that the - registers for the node are interleaved within a single set - of registers. The "ethernet-block" level describes the - shared register set, and the "ethernet" nodes describe ethernet - port-specific properties. - - Ethernet block node - - Required properties: - - #address-cells : <1> - - #size-cells : <0> - - compatible : "marvell,mv64360-eth-block" - - reg : Offset and length of the register set for this block - - Optional properties: - - clocks : Phandle to the clock control device and gate bit - - Example Discovery Ethernet block node: - ethernet-block@2000 { - #address-cells = <1>; - #size-cells = <0>; - compatible = "marvell,mv64360-eth-block"; - reg = <0x2000 0x2000>; - ethernet@0 { - ....... - }; - }; - - Ethernet port node - - Required properties: - - compatible : Should be "marvell,mv64360-eth". - - reg : Should be <0>, <1>, or <2>, according to which registers - within the silicon block the device uses. - - interrupts : where a is the interrupt number for the port. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - phy : the phandle for the PHY connected to this ethernet - controller. - - local-mac-address : 6 bytes, MAC address - - Example Discovery Ethernet port node: - ethernet@0 { - compatible = "marvell,mv64360-eth"; - reg = <0>; - interrupts = <32>; - interrupt-parent = <&PIC>; - phy = <&PHY0>; - local-mac-address = [ 00 00 00 00 00 00 ]; - }; - - - - c) Marvell Discovery PHY nodes - - Required properties: - - interrupts : where a is the interrupt number for this phy. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - reg : The ID number for the phy, usually a small integer - - Example Discovery PHY node: - ethernet-phy@1 { - compatible = "broadcom,bcm5421"; - interrupts = <76>; /* GPP 12 */ - interrupt-parent = <&PIC>; - reg = <1>; - }; - - - d) Marvell Discovery SDMA nodes - - Represent DMA hardware associated with the MPSC (multiprotocol - serial controllers). - - Required properties: - - compatible : "marvell,mv64360-sdma" - - reg : Offset and length of the register set for this device - - interrupts : where a is the interrupt number for the DMA - device. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery SDMA node: - sdma@4000 { - compatible = "marvell,mv64360-sdma"; - reg = <0x4000 0xc18>; - virtual-reg = <0xf1004000>; - interrupts = <36>; - interrupt-parent = <&PIC>; - }; - - - e) Marvell Discovery BRG nodes - - Represent baud rate generator hardware associated with the MPSC - (multiprotocol serial controllers). - - Required properties: - - compatible : "marvell,mv64360-brg" - - reg : Offset and length of the register set for this device - - clock-src : A value from 0 to 15 which selects the clock - source for the baud rate generator. This value corresponds - to the CLKS value in the BRGx configuration register. See - the mv64x60 User's Manual. - - clock-frequence : The frequency (in Hz) of the baud rate - generator's input clock. - - current-speed : The current speed setting (presumably by - firmware) of the baud rate generator. - - Example Discovery BRG node: - brg@b200 { - compatible = "marvell,mv64360-brg"; - reg = <0xb200 0x8>; - clock-src = <8>; - clock-frequency = <133333333>; - current-speed = <9600>; - }; - - - f) Marvell Discovery CUNIT nodes - - Represent the Serial Communications Unit device hardware. - - Required properties: - - reg : Offset and length of the register set for this device - - Example Discovery CUNIT node: - cunit@f200 { - reg = <0xf200 0x200>; - }; - - - g) Marvell Discovery MPSCROUTING nodes - - Represent the Discovery's MPSC routing hardware - - Required properties: - - reg : Offset and length of the register set for this device - - Example Discovery CUNIT node: - mpscrouting@b500 { - reg = <0xb400 0xc>; - }; - - - h) Marvell Discovery MPSCINTR nodes - - Represent the Discovery's MPSC DMA interrupt hardware registers - (SDMA cause and mask registers). - - Required properties: - - reg : Offset and length of the register set for this device - - Example Discovery MPSCINTR node: - mpsintr@b800 { - reg = <0xb800 0x100>; - }; - - - i) Marvell Discovery MPSC nodes - - Represent the Discovery's MPSC (Multiprotocol Serial Controller) - serial port. - - Required properties: - - compatible : "marvell,mv64360-mpsc" - - reg : Offset and length of the register set for this device - - sdma : the phandle for the SDMA node used by this port - - brg : the phandle for the BRG node used by this port - - cunit : the phandle for the CUNIT node used by this port - - mpscrouting : the phandle for the MPSCROUTING node used by this port - - mpscintr : the phandle for the MPSCINTR node used by this port - - cell-index : the hardware index of this cell in the MPSC core - - max_idle : value needed for MPSC CHR3 (Maximum Frame Length) - register - - interrupts : where a is the interrupt number for the MPSC. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery MPSCINTR node: - mpsc@8000 { - compatible = "marvell,mv64360-mpsc"; - reg = <0x8000 0x38>; - virtual-reg = <0xf1008000>; - sdma = <&SDMA0>; - brg = <&BRG0>; - cunit = <&CUNIT>; - mpscrouting = <&MPSCROUTING>; - mpscintr = <&MPSCINTR>; - cell-index = <0>; - max_idle = <40>; - interrupts = <40>; - interrupt-parent = <&PIC>; - }; - - - j) Marvell Discovery Watch Dog Timer nodes - - Represent the Discovery's watchdog timer hardware - - Required properties: - - compatible : "marvell,mv64360-wdt" - - reg : Offset and length of the register set for this device - - Example Discovery Watch Dog Timer node: - wdt@b410 { - compatible = "marvell,mv64360-wdt"; - reg = <0xb410 0x8>; - }; - - - k) Marvell Discovery I2C nodes - - Represent the Discovery's I2C hardware - - Required properties: - - device_type : "i2c" - - compatible : "marvell,mv64360-i2c" - - reg : Offset and length of the register set for this device - - interrupts : where a is the interrupt number for the I2C. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery I2C node: - compatible = "marvell,mv64360-i2c"; - reg = <0xc000 0x20>; - virtual-reg = <0xf100c000>; - interrupts = <37>; - interrupt-parent = <&PIC>; - }; - - - l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes - - Represent the Discovery's PIC hardware - - Required properties: - - #interrupt-cells : <1> - - #address-cells : <0> - - compatible : "marvell,mv64360-pic" - - reg : Offset and length of the register set for this device - - interrupt-controller - - Example Discovery PIC node: - pic { - #interrupt-cells = <1>; - #address-cells = <0>; - compatible = "marvell,mv64360-pic"; - reg = <0x0 0x88>; - interrupt-controller; - }; - - - m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes - - Represent the Discovery's MPP hardware - - Required properties: - - compatible : "marvell,mv64360-mpp" - - reg : Offset and length of the register set for this device - - Example Discovery MPP node: - mpp@f000 { - compatible = "marvell,mv64360-mpp"; - reg = <0xf000 0x10>; - }; - - - n) Marvell Discovery GPP (General Purpose Pins) nodes - - Represent the Discovery's GPP hardware - - Required properties: - - compatible : "marvell,mv64360-gpp" - - reg : Offset and length of the register set for this device - - Example Discovery GPP node: - gpp@f000 { - compatible = "marvell,mv64360-gpp"; - reg = <0xf100 0x20>; - }; - - - o) Marvell Discovery PCI host bridge node - - Represents the Discovery's PCI host bridge device. The properties - for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE - 1275-1994. A typical value for the compatible property is - "marvell,mv64360-pci". - - Example Discovery PCI host bridge node - pci@80000000 { - #address-cells = <3>; - #size-cells = <2>; - #interrupt-cells = <1>; - device_type = "pci"; - compatible = "marvell,mv64360-pci"; - reg = <0xcf8 0x8>; - ranges = <0x01000000 0x0 0x0 - 0x88000000 0x0 0x01000000 - 0x02000000 0x0 0x80000000 - 0x80000000 0x0 0x08000000>; - bus-range = <0 255>; - clock-frequency = <66000000>; - interrupt-parent = <&PIC>; - interrupt-map-mask = <0xf800 0x0 0x0 0x7>; - interrupt-map = < - /* IDSEL 0x0a */ - 0x5000 0 0 1 &PIC 80 - 0x5000 0 0 2 &PIC 81 - 0x5000 0 0 3 &PIC 91 - 0x5000 0 0 4 &PIC 93 - - /* IDSEL 0x0b */ - 0x5800 0 0 1 &PIC 91 - 0x5800 0 0 2 &PIC 93 - 0x5800 0 0 3 &PIC 80 - 0x5800 0 0 4 &PIC 81 - - /* IDSEL 0x0c */ - 0x6000 0 0 1 &PIC 91 - 0x6000 0 0 2 &PIC 93 - 0x6000 0 0 3 &PIC 80 - 0x6000 0 0 4 &PIC 81 - - /* IDSEL 0x0d */ - 0x6800 0 0 1 &PIC 93 - 0x6800 0 0 2 &PIC 80 - 0x6800 0 0 3 &PIC 81 - 0x6800 0 0 4 &PIC 91 - >; - }; - - - p) Marvell Discovery CPU Error nodes - - Represent the Discovery's CPU error handler device. - - Required properties: - - compatible : "marvell,mv64360-cpu-error" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery CPU Error node: - cpu-error@70 { - compatible = "marvell,mv64360-cpu-error"; - reg = <0x70 0x10 0x128 0x28>; - interrupts = <3>; - interrupt-parent = <&PIC>; - }; - - - q) Marvell Discovery SRAM Controller nodes - - Represent the Discovery's SRAM controller device. - - Required properties: - - compatible : "marvell,mv64360-sram-ctrl" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery SRAM Controller node: - sram-ctrl@380 { - compatible = "marvell,mv64360-sram-ctrl"; - reg = <0x380 0x80>; - interrupts = <13>; - interrupt-parent = <&PIC>; - }; - - - r) Marvell Discovery PCI Error Handler nodes - - Represent the Discovery's PCI error handler device. - - Required properties: - - compatible : "marvell,mv64360-pci-error" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery PCI Error Handler node: - pci-error@1d40 { - compatible = "marvell,mv64360-pci-error"; - reg = <0x1d40 0x40 0xc28 0x4>; - interrupts = <12>; - interrupt-parent = <&PIC>; - }; - - - s) Marvell Discovery Memory Controller nodes - - Represent the Discovery's memory controller device. - - Required properties: - - compatible : "marvell,mv64360-mem-ctrl" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery Memory Controller node: - mem-ctrl@1400 { - compatible = "marvell,mv64360-mem-ctrl"; - reg = <0x1400 0x60>; - interrupts = <17>; - interrupt-parent = <&PIC>; - }; - - diff --git a/Documentation/devicetree/bindings/media/cdns,csi2rx.txt b/Documentation/devicetree/bindings/media/cdns,csi2rx.txt new file mode 100644 index 0000000000000000000000000000000000000000..6b02a0657ad9794265e65539da57cc65c7e86242 --- /dev/null +++ b/Documentation/devicetree/bindings/media/cdns,csi2rx.txt @@ -0,0 +1,100 @@ +Cadence MIPI-CSI2 RX controller +=============================== + +The Cadence MIPI-CSI2 RX controller is a CSI-2 bridge supporting up to 4 CSI +lanes in input, and 4 different pixel streams in output. + +Required properties: + - compatible: must be set to "cdns,csi2rx" and an SoC-specific compatible + - reg: base address and size of the memory mapped region + - clocks: phandles to the clocks driving the controller + - clock-names: must contain: + * sys_clk: main clock + * p_clk: register bank clock + * pixel_if[0-3]_clk: pixel stream output clock, one for each stream + implemented in hardware, between 0 and 3 + +Optional properties: + - phys: phandle to the external D-PHY, phy-names must be provided + - phy-names: must contain "dphy", if the implementation uses an + external D-PHY + +Required subnodes: + - ports: A ports node with one port child node per device input and output + port, in accordance with the video interface bindings defined in + Documentation/devicetree/bindings/media/video-interfaces.txt. The + port nodes are numbered as follows: + + Port Description + ----------------------------- + 0 CSI-2 input + 1 Stream 0 output + 2 Stream 1 output + 3 Stream 2 output + 4 Stream 3 output + + The stream output port nodes are optional if they are not + connected to anything at the hardware level or implemented + in the design.Since there is only one endpoint per port, + the endpoints are not numbered. + + +Example: + +csi2rx: csi-bridge@0d060000 { + compatible = "cdns,csi2rx"; + reg = <0x0d060000 0x1000>; + clocks = <&byteclock>, <&byteclock> + <&coreclock>, <&coreclock>, + <&coreclock>, <&coreclock>; + clock-names = "sys_clk", "p_clk", + "pixel_if0_clk", "pixel_if1_clk", + "pixel_if2_clk", "pixel_if3_clk"; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + + csi2rx_in_sensor: endpoint { + remote-endpoint = <&sensor_out_csi2rx>; + clock-lanes = <0>; + data-lanes = <1 2>; + }; + }; + + port@1 { + reg = <1>; + + csi2rx_out_grabber0: endpoint { + remote-endpoint = <&grabber0_in_csi2rx>; + }; + }; + + port@2 { + reg = <2>; + + csi2rx_out_grabber1: endpoint { + remote-endpoint = <&grabber1_in_csi2rx>; + }; + }; + + port@3 { + reg = <3>; + + csi2rx_out_grabber2: endpoint { + remote-endpoint = <&grabber2_in_csi2rx>; + }; + }; + + port@4 { + reg = <4>; + + csi2rx_out_grabber3: endpoint { + remote-endpoint = <&grabber3_in_csi2rx>; + }; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/media/cdns,csi2tx.txt b/Documentation/devicetree/bindings/media/cdns,csi2tx.txt new file mode 100644 index 0000000000000000000000000000000000000000..459c6e332f5290a9a746306041ab2a848286a0f3 --- /dev/null +++ b/Documentation/devicetree/bindings/media/cdns,csi2tx.txt @@ -0,0 +1,98 @@ +Cadence MIPI-CSI2 TX controller +=============================== + +The Cadence MIPI-CSI2 TX controller is a CSI-2 bridge supporting up to +4 CSI lanes in output, and up to 4 different pixel streams in input. + +Required properties: + - compatible: must be set to "cdns,csi2tx" + - reg: base address and size of the memory mapped region + - clocks: phandles to the clocks driving the controller + - clock-names: must contain: + * esc_clk: escape mode clock + * p_clk: register bank clock + * pixel_if[0-3]_clk: pixel stream output clock, one for each stream + implemented in hardware, between 0 and 3 + +Optional properties + - phys: phandle to the D-PHY. If it is set, phy-names need to be set + - phy-names: must contain "dphy" + +Required subnodes: + - ports: A ports node with one port child node per device input and output + port, in accordance with the video interface bindings defined in + Documentation/devicetree/bindings/media/video-interfaces.txt. The + port nodes are numbered as follows. + + Port Description + ----------------------------- + 0 CSI-2 output + 1 Stream 0 input + 2 Stream 1 input + 3 Stream 2 input + 4 Stream 3 input + + The stream input port nodes are optional if they are not + connected to anything at the hardware level or implemented + in the design. Since there is only one endpoint per port, + the endpoints are not numbered. + +Example: + +csi2tx: csi-bridge@0d0e1000 { + compatible = "cdns,csi2tx"; + reg = <0x0d0e1000 0x1000>; + clocks = <&byteclock>, <&byteclock>, + <&coreclock>, <&coreclock>, + <&coreclock>, <&coreclock>; + clock-names = "p_clk", "esc_clk", + "pixel_if0_clk", "pixel_if1_clk", + "pixel_if2_clk", "pixel_if3_clk"; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + + csi2tx_out: endpoint { + remote-endpoint = <&remote_in>; + clock-lanes = <0>; + data-lanes = <1 2>; + }; + }; + + port@1 { + reg = <1>; + + csi2tx_in_stream0: endpoint { + remote-endpoint = <&stream0_out>; + }; + }; + + port@2 { + reg = <2>; + + csi2tx_in_stream1: endpoint { + remote-endpoint = <&stream1_out>; + }; + }; + + port@3 { + reg = <3>; + + csi2tx_in_stream2: endpoint { + remote-endpoint = <&stream2_out>; + }; + }; + + port@4 { + reg = <4>; + + csi2tx_in_stream3: endpoint { + remote-endpoint = <&stream3_out>; + }; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/media/i2c/ov7251.txt b/Documentation/devicetree/bindings/media/i2c/ov7251.txt new file mode 100644 index 0000000000000000000000000000000000000000..8281151f7493e3f5c28006002c80de97dd0b3646 --- /dev/null +++ b/Documentation/devicetree/bindings/media/i2c/ov7251.txt @@ -0,0 +1,52 @@ +* Omnivision 1/7.5-Inch B&W VGA CMOS Digital Image Sensor + +The Omnivision OV7251 is a 1/7.5-Inch CMOS active pixel digital image sensor +with an active array size of 640H x 480V. It is programmable through a serial +I2C interface. + +Required Properties: +- compatible: Value should be "ovti,ov7251". +- clocks: Reference to the xclk clock. +- clock-names: Should be "xclk". +- clock-frequency: Frequency of the xclk clock. +- enable-gpios: Chip enable GPIO. Polarity is GPIO_ACTIVE_HIGH. This corresponds + to the hardware pin XSHUTDOWN which is physically active low. +- vdddo-supply: Chip digital IO regulator. +- vdda-supply: Chip analog regulator. +- vddd-supply: Chip digital core regulator. + +The device node shall contain one 'port' child node with a single 'endpoint' +subnode for its digital output video port, in accordance with the video +interface bindings defined in +Documentation/devicetree/bindings/media/video-interfaces.txt. + +Example: + + &i2c1 { + ... + + ov7251: camera-sensor@60 { + compatible = "ovti,ov7251"; + reg = <0x60>; + + enable-gpios = <&gpio1 6 GPIO_ACTIVE_HIGH>; + pinctrl-names = "default"; + pinctrl-0 = <&camera_bw_default>; + + clocks = <&clks 200>; + clock-names = "xclk"; + clock-frequency = <24000000>; + + vdddo-supply = <&camera_dovdd_1v8>; + vdda-supply = <&camera_avdd_2v8>; + vddd-supply = <&camera_dvdd_1v2>; + + port { + ov7251_ep: endpoint { + clock-lanes = <1>; + data-lanes = <0>; + remote-endpoint = <&csi0_ep>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/media/i2c/ov772x.txt b/Documentation/devicetree/bindings/media/i2c/ov772x.txt new file mode 100644 index 0000000000000000000000000000000000000000..0b3ede5b8e6aea41ccca0e638a5ed66c69c008d5 --- /dev/null +++ b/Documentation/devicetree/bindings/media/i2c/ov772x.txt @@ -0,0 +1,40 @@ +* Omnivision OV7720/OV7725 CMOS sensor + +The Omnivision OV7720/OV7725 sensor supports multiple resolutions output, +such as VGA, QVGA, and any size scaling down from CIF to 40x30. It also can +support the YUV422, RGB565/555/444, GRB422 or raw RGB output formats. + +Required Properties: +- compatible: shall be one of + "ovti,ov7720" + "ovti,ov7725" +- clocks: reference to the xclk input clock. + +Optional Properties: +- reset-gpios: reference to the GPIO connected to the RSTB pin which is + active low, if any. +- powerdown-gpios: reference to the GPIO connected to the PWDN pin which is + active high, if any. + +The device node shall contain one 'port' child node with one child 'endpoint' +subnode for its digital output video port, in accordance with the video +interface bindings defined in Documentation/devicetree/bindings/media/ +video-interfaces.txt. + +Example: + +&i2c0 { + ov772x: camera@21 { + compatible = "ovti,ov7725"; + reg = <0x21>; + reset-gpios = <&axi_gpio_0 0 GPIO_ACTIVE_LOW>; + powerdown-gpios = <&axi_gpio_0 1 GPIO_ACTIVE_LOW>; + clocks = <&xclk>; + + port { + ov772x_0: endpoint { + remote-endpoint = <&vcap1_in0>; + }; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/media/i2c/panasonic,amg88xx.txt b/Documentation/devicetree/bindings/media/i2c/panasonic,amg88xx.txt new file mode 100644 index 0000000000000000000000000000000000000000..4a3181a3dd7e2f606a9517ff747f0ceb38f2cf49 --- /dev/null +++ b/Documentation/devicetree/bindings/media/i2c/panasonic,amg88xx.txt @@ -0,0 +1,19 @@ +* Panasonic AMG88xx + +The Panasonic family of AMG88xx Grid-Eye sensors allow recording +8x8 10Hz video which consists of thermal datapoints + +Required Properties: + - compatible : Must be "panasonic,amg88xx" + - reg : i2c address of the device + +Example: + + i2c0@1c22000 { + ... + amg88xx@69 { + compatible = "panasonic,amg88xx"; + reg = <0x69>; + }; + ... + }; diff --git a/Documentation/devicetree/bindings/media/rcar_vin.txt b/Documentation/devicetree/bindings/media/rcar_vin.txt index 1ce7ff9449c556d95fa3b167683c4d1eed153ba7..a19517e1c669eb354efadad4f2bd0532da5245e0 100644 --- a/Documentation/devicetree/bindings/media/rcar_vin.txt +++ b/Documentation/devicetree/bindings/media/rcar_vin.txt @@ -2,20 +2,28 @@ Renesas R-Car Video Input driver (rcar_vin) ------------------------------------------- The rcar_vin device provides video input capabilities for the Renesas R-Car -family of devices. The current blocks are always slaves and suppot one input -channel which can be either RGB, YUYV or BT656. +family of devices. + +Each VIN instance has a single parallel input that supports RGB and YUV video, +with both external synchronization and BT.656 synchronization for the latter. +Depending on the instance the VIN input is connected to external SoC pins, or +on Gen3 platforms to a CSI-2 receiver. - compatible: Must be one or more of the following - - "renesas,vin-r8a7795" for the R8A7795 device - - "renesas,vin-r8a7794" for the R8A7794 device - - "renesas,vin-r8a7793" for the R8A7793 device - - "renesas,vin-r8a7792" for the R8A7792 device - - "renesas,vin-r8a7791" for the R8A7791 device - - "renesas,vin-r8a7790" for the R8A7790 device - - "renesas,vin-r8a7779" for the R8A7779 device + - "renesas,vin-r8a7743" for the R8A7743 device + - "renesas,vin-r8a7745" for the R8A7745 device - "renesas,vin-r8a7778" for the R8A7778 device - - "renesas,rcar-gen2-vin" for a generic R-Car Gen2 compatible device. - - "renesas,rcar-gen3-vin" for a generic R-Car Gen3 compatible device. + - "renesas,vin-r8a7779" for the R8A7779 device + - "renesas,vin-r8a7790" for the R8A7790 device + - "renesas,vin-r8a7791" for the R8A7791 device + - "renesas,vin-r8a7792" for the R8A7792 device + - "renesas,vin-r8a7793" for the R8A7793 device + - "renesas,vin-r8a7794" for the R8A7794 device + - "renesas,vin-r8a7795" for the R8A7795 device + - "renesas,vin-r8a7796" for the R8A7796 device + - "renesas,vin-r8a77970" for the R8A77970 device + - "renesas,rcar-gen2-vin" for a generic R-Car Gen2 or RZ/G1 compatible + device. When compatible with the generic version nodes must list the SoC-specific version corresponding to the platform first @@ -28,21 +36,38 @@ channel which can be either RGB, YUYV or BT656. Additionally, an alias named vinX will need to be created to specify which video input device this is. -The per-board settings: +The per-board settings Gen2 platforms: - port sub-node describing a single endpoint connected to the vin as described in video-interfaces.txt[1]. Only the first one will be considered as each vin interface has one input port. - These settings are used to work out video input format and widths - into the system. +The per-board settings Gen3 platforms: +Gen3 platforms can support both a single connected parallel input source +from external SoC pins (port0) and/or multiple parallel input sources +from local SoC CSI-2 receivers (port1) depending on SoC. -Device node example -------------------- +- renesas,id - ID number of the VIN, VINx in the documentation. +- ports + - port 0 - sub-node describing a single endpoint connected to the VIN + from external SoC pins described in video-interfaces.txt[1]. + Describing more then one endpoint in port 0 is invalid. Only VIN + instances that are connected to external pins should have port 0. + - port 1 - sub-nodes describing one or more endpoints connected to + the VIN from local SoC CSI-2 receivers. The endpoint numbers must + use the following schema. - aliases { - vin0 = &vin0; - }; + - Endpoint 0 - sub-node describing the endpoint connected to CSI20 + - Endpoint 1 - sub-node describing the endpoint connected to CSI21 + - Endpoint 2 - sub-node describing the endpoint connected to CSI40 + - Endpoint 3 - sub-node describing the endpoint connected to CSI41 + +Device node example for Gen2 platforms +-------------------------------------- + + aliases { + vin0 = &vin0; + }; vin0: vin@e6ef0000 { compatible = "renesas,vin-r8a7790", "renesas,rcar-gen2-vin"; @@ -52,8 +77,8 @@ Device node example status = "disabled"; }; -Board setup example (vin1 composite video input) ------------------------------------------------- +Board setup example for Gen2 platforms (vin1 composite video input) +------------------------------------------------------------------- &i2c2 { status = "okay"; @@ -92,6 +117,77 @@ Board setup example (vin1 composite video input) }; }; +Device node example for Gen3 platforms +-------------------------------------- + + vin0: video@e6ef0000 { + compatible = "renesas,vin-r8a7795"; + reg = <0 0xe6ef0000 0 0x1000>; + interrupts = ; + clocks = <&cpg CPG_MOD 811>; + power-domains = <&sysc R8A7795_PD_ALWAYS_ON>; + resets = <&cpg 811>; + renesas,id = <0>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@1 { + #address-cells = <1>; + #size-cells = <0>; + + reg = <1>; + + vin0csi20: endpoint@0 { + reg = <0>; + remote-endpoint= <&csi20vin0>; + }; + vin0csi21: endpoint@1 { + reg = <1>; + remote-endpoint= <&csi21vin0>; + }; + vin0csi40: endpoint@2 { + reg = <2>; + remote-endpoint= <&csi40vin0>; + }; + }; + }; + }; + csi20: csi2@fea80000 { + compatible = "renesas,r8a7795-csi2"; + reg = <0 0xfea80000 0 0x10000>; + interrupts = ; + clocks = <&cpg CPG_MOD 714>; + power-domains = <&sysc R8A7795_PD_ALWAYS_ON>; + resets = <&cpg 714>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + csi20_in: endpoint { + clock-lanes = <0>; + data-lanes = <1>; + remote-endpoint = <&adv7482_txb>; + }; + }; + + port@1 { + #address-cells = <1>; + #size-cells = <0>; + + reg = <1>; + + csi20vin0: endpoint@0 { + reg = <0>; + remote-endpoint = <&vin0csi20>; + }; + }; + }; + }; [1] video-interfaces.txt common video media interface diff --git a/Documentation/devicetree/bindings/media/renesas,ceu.txt b/Documentation/devicetree/bindings/media/renesas,ceu.txt index 3fc66dfb192cd1bf2d5a307a20c92e83f59755e6..8a7a616e9019b12f2e498b0bfa16ec12958622ad 100644 --- a/Documentation/devicetree/bindings/media/renesas,ceu.txt +++ b/Documentation/devicetree/bindings/media/renesas,ceu.txt @@ -2,14 +2,15 @@ Renesas Capture Engine Unit (CEU) ---------------------------------------------- The Capture Engine Unit is the image capture interface found in the Renesas -SH Mobile and RZ SoCs. +SH Mobile, R-Mobile and RZ SoCs. The interface supports a single parallel input with data bus width of 8 or 16 bits. Required properties: -- compatible: Shall be "renesas,r7s72100-ceu" for CEU units found in RZ/A1H - and RZ/A1M SoCs. +- compatible: Shall be one of the following values: + "renesas,r7s72100-ceu" for CEU units found in RZ/A1H and RZ/A1M SoCs + "renesas,r8a7740-ceu" for CEU units found in R-Mobile A1 R8A7740 SoCs - reg: Registers address base and size. - interrupts: The interrupt specifier. diff --git a/Documentation/devicetree/bindings/media/renesas,rcar-csi2.txt b/Documentation/devicetree/bindings/media/renesas,rcar-csi2.txt new file mode 100644 index 0000000000000000000000000000000000000000..2d385b65b275bc584a0b7022c8f4ea8e2b8c6b99 --- /dev/null +++ b/Documentation/devicetree/bindings/media/renesas,rcar-csi2.txt @@ -0,0 +1,101 @@ +Renesas R-Car MIPI CSI-2 +------------------------ + +The R-Car CSI-2 receiver device provides MIPI CSI-2 capabilities for the +Renesas R-Car family of devices. It is used in conjunction with the +R-Car VIN module, which provides the video capture capabilities. + +Mandatory properties +-------------------- + - compatible: Must be one or more of the following + - "renesas,r8a7795-csi2" for the R8A7795 device. + - "renesas,r8a7796-csi2" for the R8A7796 device. + - "renesas,r8a77965-csi2" for the R8A77965 device. + - "renesas,r8a77970-csi2" for the R8A77970 device. + + - reg: the register base and size for the device registers + - interrupts: the interrupt for the device + - clocks: reference to the parent clock + +The device node shall contain two 'port' child nodes according to the +bindings defined in Documentation/devicetree/bindings/media/ +video-interfaces.txt. port@0 shall connect to the CSI-2 source. port@1 +shall connect to all the R-Car VIN modules that have a hardware +connection to the CSI-2 receiver. + +- port@0- Video source (mandatory) + - endpoint@0 - sub-node describing the endpoint that is the video source + +- port@1 - VIN instances (optional) + - One endpoint sub-node for every R-Car VIN instance which is connected + to the R-Car CSI-2 receiver. + +Example: + + csi20: csi2@fea80000 { + compatible = "renesas,r8a7796-csi2"; + reg = <0 0xfea80000 0 0x10000>; + interrupts = <0 184 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&cpg CPG_MOD 714>; + power-domains = <&sysc R8A7796_PD_ALWAYS_ON>; + resets = <&cpg 714>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + #address-cells = <1>; + #size-cells = <0>; + + reg = <0>; + + csi20_in: endpoint@0 { + reg = <0>; + clock-lanes = <0>; + data-lanes = <1>; + remote-endpoint = <&adv7482_txb>; + }; + }; + + port@1 { + #address-cells = <1>; + #size-cells = <0>; + + reg = <1>; + + csi20vin0: endpoint@0 { + reg = <0>; + remote-endpoint = <&vin0csi20>; + }; + csi20vin1: endpoint@1 { + reg = <1>; + remote-endpoint = <&vin1csi20>; + }; + csi20vin2: endpoint@2 { + reg = <2>; + remote-endpoint = <&vin2csi20>; + }; + csi20vin3: endpoint@3 { + reg = <3>; + remote-endpoint = <&vin3csi20>; + }; + csi20vin4: endpoint@4 { + reg = <4>; + remote-endpoint = <&vin4csi20>; + }; + csi20vin5: endpoint@5 { + reg = <5>; + remote-endpoint = <&vin5csi20>; + }; + csi20vin6: endpoint@6 { + reg = <6>; + remote-endpoint = <&vin6csi20>; + }; + csi20vin7: endpoint@7 { + reg = <7>; + remote-endpoint = <&vin7csi20>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt b/Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt index c7888d6f640874df3607a578e68cb124d0b8cfa8..880d4d70c9fd741ac13101721ced18f04336c373 100644 --- a/Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt +++ b/Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt @@ -28,7 +28,7 @@ See: Documentation/devicetree/bindings/clock/clock-bindings.txt - pinctrl-names : a pinctrl state named tsin%d-serial or tsin%d-parallel (where %d is tsin-num) must be defined for each tsin child node. - pinctrl-0 : phandle referencing pin configuration for this tsin configuration -See: Documentation/devicetree/bindings/pinctrl/pinctrl-binding.txt +See: Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt Required properties (tsin (child) node): diff --git a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-mc.txt b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-mc.txt new file mode 100644 index 0000000000000000000000000000000000000000..7d60a50a4fa1c72ec7c6347705b3ffea3898c6b3 --- /dev/null +++ b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-mc.txt @@ -0,0 +1,26 @@ +NVIDIA Tegra20 MC(Memory Controller) + +Required properties: +- compatible : "nvidia,tegra20-mc" +- reg : Should contain 2 register ranges(address and length); see the + example below. Note that the MC registers are interleaved with the + GART registers, and hence must be represented as multiple ranges. +- interrupts : Should contain MC General interrupt. +- #reset-cells : Should be 1. This cell represents memory client module ID. + The assignments may be found in header file + or in the TRM documentation. + +Example: + mc: memory-controller@7000f000 { + compatible = "nvidia,tegra20-mc"; + reg = <0x7000f000 0x024 + 0x7000f03c 0x3c4>; + interrupts = <0 77 0x04>; + #reset-cells = <1>; + }; + + video-codec@6001a000 { + compatible = "nvidia,tegra20-vde"; + ... + resets = <&mc TEGRA20_MC_RESET_VDE>; + }; diff --git a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.txt b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.txt index 14968b048cd3a5a963cd6f7604bc2ff5c19f4c11..a878b5908a4d0652cce014bbe465137d48056ded 100644 --- a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.txt +++ b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.txt @@ -12,6 +12,9 @@ Required properties: - clock-names: Must include the following entries: - mc: the module's clock input - interrupts: The interrupt outputs from the controller. +- #reset-cells : Should be 1. This cell represents memory client module ID. + The assignments may be found in header file + or in the TRM documentation. Required properties for Tegra30, Tegra114, Tegra124, Tegra132 and Tegra210: - #iommu-cells: Should be 1. The single cell of the IOMMU specifier defines @@ -72,12 +75,14 @@ Example SoC include file: interrupts = ; #iommu-cells = <1>; + #reset-cells = <1>; }; sdhci@700b0000 { compatible = "nvidia,tegra124-sdhci"; ... iommus = <&mc TEGRA_SWGROUP_SDMMC1A>; + resets = <&mc TEGRA124_MC_RESET_SDMMC1>; }; }; diff --git a/Documentation/devicetree/bindings/mfd/arizona.txt b/Documentation/devicetree/bindings/mfd/arizona.txt index bdd017686ea5a082faf99e389b48a624074471cc..a014afb07902350f083d8352bc353e364d7c031a 100644 --- a/Documentation/devicetree/bindings/mfd/arizona.txt +++ b/Documentation/devicetree/bindings/mfd/arizona.txt @@ -50,7 +50,7 @@ Required properties: Optional properties: - - wlf,reset : GPIO specifier for the GPIO controlling /RESET + - reset-gpios : GPIO specifier for the GPIO controlling /RESET - clocks: Should reference the clocks supplied on MCLK1 and MCLK2 - clock-names: Should contains two strings: @@ -70,6 +70,10 @@ Optional properties: Documentation/devicetree/bindings/regulator/regulator.txt (wm5102, wm5110, wm8280, wm8997, wm8998, wm1814) +Deprecated properties: + + - wlf,reset : GPIO specifier for the GPIO controlling /RESET + Also see child specific device properties: Regulator - ../regulator/arizona-regulator.txt Extcon - ../extcon/extcon-arizona.txt diff --git a/Documentation/devicetree/bindings/mfd/as3722.txt b/Documentation/devicetree/bindings/mfd/as3722.txt index 0b2a6099aa20a25cdbddc537c3318b0c20116d52..5297b22107040046a2bf6ddf1213941a3ddc15f8 100644 --- a/Documentation/devicetree/bindings/mfd/as3722.txt +++ b/Documentation/devicetree/bindings/mfd/as3722.txt @@ -46,7 +46,7 @@ is required: Following properties are require if pin control setting is required at boot. - pinctrl-names: A pinctrl state named "default" be defined, using the - bindings in pinctrl/pinctrl-binding.txt. + bindings in pinctrl/pinctrl-bindings.txt. - pinctrl[0...n]: Properties to contain the phandle that refer to different nodes of pin control settings. These nodes represents the pin control setting of state 0 to state n. Each of these diff --git a/Documentation/devicetree/bindings/mfd/axp20x.txt b/Documentation/devicetree/bindings/mfd/axp20x.txt index 9455503b02993109cb0621519654eacadbc2f35a..d1762f3b30afab38a506adece568f146f91fecc4 100644 --- a/Documentation/devicetree/bindings/mfd/axp20x.txt +++ b/Documentation/devicetree/bindings/mfd/axp20x.txt @@ -43,7 +43,7 @@ Optional properties: regulator to drive the OTG VBus, rather then as an input pin which signals whether the board is driving OTG VBus or not. - (axp221 / axp223 / axp813 only) + (axp221 / axp223 / axp803/ axp813 only) - x-powers,master-mode: Boolean (axp806 only). Set this when the PMIC is wired for master mode. The default is slave mode. @@ -132,6 +132,7 @@ FLDO2 : LDO : fldoin-supply : shared supply LDO_IO0 : LDO : ips-supply : GPIO 0 LDO_IO1 : LDO : ips-supply : GPIO 1 RTC_LDO : LDO : ips-supply : always on +DRIVEVBUS : Enable output : drivevbus-supply : external regulator AXP806 regulators, type, and corresponding input supply names: diff --git a/Documentation/devicetree/bindings/mfd/bd9571mwv.txt b/Documentation/devicetree/bindings/mfd/bd9571mwv.txt index 9ab216a851d5619becdf70da0b5a2c14f73026e2..25d1f697eb25c67d9a42dbd7a50dbb3d0af88aaf 100644 --- a/Documentation/devicetree/bindings/mfd/bd9571mwv.txt +++ b/Documentation/devicetree/bindings/mfd/bd9571mwv.txt @@ -25,6 +25,25 @@ Required properties: Each child node is defined using the standard binding for regulators. +Optional properties: + - rohm,ddr-backup-power : Value to use for DDR-Backup Power (default 0). + This is a bitmask that specifies which DDR power + rails need to be kept powered when backup mode is + entered, for system suspend: + - bit 0: DDR0 + - bit 1: DDR1 + - bit 2: DDR0C + - bit 3: DDR1C + These bits match the KEEPON_DDR* bits in the + documentation for the "BKUP Mode Cnt" register. + - rohm,rstbmode-level: The RSTB signal is configured for level mode, to + accommodate a toggle power switch (the RSTBMODE pin is + strapped low). + - rohm,rstbmode-pulse: The RSTB signal is configured for pulse mode, to + accommodate a momentary power switch (the RSTBMODE pin + is strapped high). + The two properties above are mutually exclusive. + Example: pmic: pmic@30 { @@ -36,6 +55,8 @@ Example: #interrupt-cells = <2>; gpio-controller; #gpio-cells = <2>; + rohm,ddr-backup-power = <0xf>; + rohm,rstbmode-pulse; regulators { dvfs: dvfs { diff --git a/Documentation/devicetree/bindings/mfd/da9063.txt b/Documentation/devicetree/bindings/mfd/da9063.txt index 05b21bcb85437198e669ece2725a49c092316264..443e68286957a2c11af4a6660df7cf029902c121 100644 --- a/Documentation/devicetree/bindings/mfd/da9063.txt +++ b/Documentation/devicetree/bindings/mfd/da9063.txt @@ -1,4 +1,4 @@ -* Dialog DA9063 Power Management Integrated Circuit (PMIC) +* Dialog DA9063/DA9063L Power Management Integrated Circuit (PMIC) DA9093 consists of a large and varied group of sub-devices (I2C Only): @@ -6,14 +6,14 @@ Device Supply Names Description ------ ------------ ----------- da9063-regulator : : LDOs & BUCKs da9063-onkey : : On Key -da9063-rtc : : Real-Time Clock +da9063-rtc : : Real-Time Clock (DA9063 only) da9063-watchdog : : Watchdog ====== Required properties: -- compatible : Should be "dlg,da9063" +- compatible : Should be "dlg,da9063" or "dlg,da9063l" - reg : Specifies the I2C slave address (this defaults to 0x58 but it can be modified to match the chip's OTP settings). - interrupt-parent : Specifies the reference to the interrupt controller for @@ -23,8 +23,8 @@ Required properties: Sub-nodes: -- regulators : This node defines the settings for the LDOs and BUCKs. The - DA9063 regulators are bound using their names listed below: +- regulators : This node defines the settings for the LDOs and BUCKs. + The DA9063(L) regulators are bound using their names listed below: bcore1 : BUCK CORE1 bcore2 : BUCK CORE2 @@ -32,16 +32,16 @@ Sub-nodes: bmem : BUCK MEM bio : BUCK IO bperi : BUCK PERI - ldo1 : LDO_1 - ldo2 : LDO_2 + ldo1 : LDO_1 (DA9063 only) + ldo2 : LDO_2 (DA9063 only) ldo3 : LDO_3 - ldo4 : LDO_4 - ldo5 : LDO_5 - ldo6 : LDO_6 + ldo4 : LDO_4 (DA9063 only) + ldo5 : LDO_5 (DA9063 only) + ldo6 : LDO_6 (DA9063 only) ldo7 : LDO_7 ldo8 : LDO_8 ldo9 : LDO_9 - ldo10 : LDO_10 + ldo10 : LDO_10 (DA9063 only) ldo11 : LDO_11 The component follows the standard regulator framework and the bindings @@ -49,8 +49,9 @@ Sub-nodes: Documentation/devicetree/bindings/regulator/regulator.txt - rtc : This node defines settings for the Real-Time Clock associated with - the DA9063. There are currently no entries in this binding, however - compatible = "dlg,da9063-rtc" should be added if a node is created. + the DA9063 only. The RTC is not present in DA9063L. There are currently + no entries in this binding, however compatible = "dlg,da9063-rtc" should + be added if a node is created. - onkey : This node defines the OnKey settings for controlling the key functionality of the device. The node should contain the compatible property @@ -65,8 +66,9 @@ Sub-nodes: and KEY_SLEEP. - watchdog : This node defines settings for the Watchdog timer associated - with the DA9063. There are currently no entries in this binding, however - compatible = "dlg,da9063-watchdog" should be added if a node is created. + with the DA9063 and DA9063L. There are currently no entries in this + binding, however compatible = "dlg,da9063-watchdog" should be added + if a node is created. Example: diff --git a/Documentation/devicetree/bindings/mfd/motorola-cpcap.txt b/Documentation/devicetree/bindings/mfd/motorola-cpcap.txt index 15bc885f9df45a1de3d975d68deb9dd8cc55f050..c639705a98ef67c38a5e79b43b78cc77de76f326 100644 --- a/Documentation/devicetree/bindings/mfd/motorola-cpcap.txt +++ b/Documentation/devicetree/bindings/mfd/motorola-cpcap.txt @@ -12,6 +12,30 @@ Required properties: - spi-max-frequency : Typically set to 3000000 - spi-cs-high : SPI chip select direction +Optional subnodes: + +The sub-functions of CPCAP get their own node with their own compatible values, +which are described in the following files: + +- ../power/supply/cpcap-battery.txt +- ../power/supply/cpcap-charger.txt +- ../regulator/cpcap-regulator.txt +- ../phy/phy-cpcap-usb.txt +- ../input/cpcap-pwrbutton.txt +- ../rtc/cpcap-rtc.txt +- ../leds/leds-cpcap.txt +- ../iio/adc/cpcap-adc.txt + +The only exception is the audio codec. Instead of a compatible value its +node must be named "audio-codec". + +Required properties for the audio-codec subnode: + +- #sound-dai-cells = <1>; + +The audio-codec provides two DAIs. The first one is connected to the +Stereo HiFi DAC and the second one is connected to the Voice DAC. + Example: &mcspi1 { @@ -26,6 +50,24 @@ Example: #size-cells = <0>; spi-max-frequency = <3000000>; spi-cs-high; + + audio-codec { + #sound-dai-cells = <1>; + + /* HiFi */ + port@0 { + endpoint { + remote-endpoint = <&cpu_dai1>; + }; + }; + + /* Voice */ + port@1 { + endpoint { + remote-endpoint = <&cpu_dai2>; + }; + }; + }; }; }; diff --git a/Documentation/devicetree/bindings/mfd/mt6397.txt b/Documentation/devicetree/bindings/mfd/mt6397.txt index 522a3bbf1bac0482018970f966a06b58aa43faf5..0ebd08af777d4d461c13e1d414627ea209d4af81 100644 --- a/Documentation/devicetree/bindings/mfd/mt6397.txt +++ b/Documentation/devicetree/bindings/mfd/mt6397.txt @@ -7,11 +7,12 @@ MT6397/MT6323 is a multifunction device with the following sub modules: - GPIO - Clock - LED +- Keys It is interfaced to host controller using SPI interface by a proprietary hardware called PMIC wrapper or pwrap. MT6397/MT6323 MFD is a child device of pwrap. See the following for pwarp node definitions: -Documentation/devicetree/bindings/soc/pwrap.txt +Documentation/devicetree/bindings/soc/mediatek/pwrap.txt This document describes the binding for MFD device and its sub module. @@ -40,6 +41,11 @@ Optional subnodes: - compatible: "mediatek,mt6323-led" see Documentation/devicetree/bindings/leds/leds-mt6323.txt +- keys + Required properties: + - compatible: "mediatek,mt6397-keys" or "mediatek,mt6323-keys" + see Documentation/devicetree/bindings/input/mtk-pmic-keys.txt + Example: pwrap: pwrap@1000f000 { compatible = "mediatek,mt8135-pwrap"; diff --git a/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.txt b/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.txt index 6ac06c1b9aec8908095c0b2fc22af1882334293c..143706222a51cfc8c8785199f3a9d5b1d53eec33 100644 --- a/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.txt +++ b/Documentation/devicetree/bindings/mfd/qcom,spmi-pmic.txt @@ -29,6 +29,9 @@ Required properties: "qcom,pm8916", "qcom,pm8004", "qcom,pm8909", + "qcom,pm8998", + "qcom,pmi8998", + "qcom,pm8005", or generalized "qcom,spmi-pmic". - reg: Specifies the SPMI USID slave address for this device. For more information see: diff --git a/Documentation/devicetree/bindings/mfd/stm32-timers.txt b/Documentation/devicetree/bindings/mfd/stm32-timers.txt index 1db6e0057a638e09a5346956a70620c31276fdf6..0e900b52e8959cf2653c554f6f179bb2a44adab6 100644 --- a/Documentation/devicetree/bindings/mfd/stm32-timers.txt +++ b/Documentation/devicetree/bindings/mfd/stm32-timers.txt @@ -19,6 +19,11 @@ Required parameters: Optional parameters: - resets: Phandle to the parent reset controller. See ../reset/st,stm32-rcc.txt +- dmas: List of phandle to dma channels that can be used for + this timer instance. There may be up to 7 dma channels. +- dma-names: List of dma names. Must match 'dmas' property. Valid + names are: "ch1", "ch2", "ch3", "ch4", "up", "trig", + "com". Optional subnodes: - pwm: See ../pwm/pwm-stm32.txt @@ -44,3 +49,18 @@ Example: reg = <0>; }; }; + +Example with all dmas: + timer@40010000 { + ... + dmas = <&dmamux1 11 0x400 0x0>, + <&dmamux1 12 0x400 0x0>, + <&dmamux1 13 0x400 0x0>, + <&dmamux1 14 0x400 0x0>, + <&dmamux1 15 0x400 0x0>, + <&dmamux1 16 0x400 0x0>, + <&dmamux1 17 0x400 0x0>; + dma-names = "ch1", "ch2", "ch3", "ch4", "up", "trig", "com"; + ... + child nodes... + }; diff --git a/Documentation/devicetree/bindings/mfd/sun6i-prcm.txt b/Documentation/devicetree/bindings/mfd/sun6i-prcm.txt index dd2c06540485bd94c8ee7197d5e052f4dbef4de1..daa091c2e67ba06063e778ce0ca0c012c808a590 100644 --- a/Documentation/devicetree/bindings/mfd/sun6i-prcm.txt +++ b/Documentation/devicetree/bindings/mfd/sun6i-prcm.txt @@ -8,8 +8,8 @@ Required properties: - reg: The PRCM registers range The prcm node may contain several subdevices definitions: - - see Documentation/devicetree/clk/sunxi.txt for clock devices - - see Documentation/devicetree/reset/allwinner,sunxi-clock-reset.txt for reset + - see Documentation/devicetree/bindings/clock/sunxi.txt for clock devices + - see Documentation/devicetree/bindings/reset/allwinner,sunxi-clock-reset.txt for reset controller devices diff --git a/Documentation/devicetree/bindings/mips/brcm/soc.txt b/Documentation/devicetree/bindings/mips/brcm/soc.txt index 356c29789cf54862e1ece93dc40449e221481304..3a66d3c483e1aad12298fcf297767931db09051a 100644 --- a/Documentation/devicetree/bindings/mips/brcm/soc.txt +++ b/Documentation/devicetree/bindings/mips/brcm/soc.txt @@ -152,7 +152,7 @@ Required properties: - compatible : should contain one of: "brcm,bcm7425-timers" "brcm,bcm7429-timers" - "brcm,bcm7435-timers and + "brcm,bcm7435-timers" and "brcm,brcmstb-timers" - reg : the timers register range - interrupts : the interrupt line for this timer block diff --git a/Documentation/devicetree/bindings/mips/lantiq/rcu.txt b/Documentation/devicetree/bindings/mips/lantiq/rcu.txt index a086f1e1cdd78809f87f145dba13adaf68849602..7f0822b4beae404f630ba9d222b8560d5440fcec 100644 --- a/Documentation/devicetree/bindings/mips/lantiq/rcu.txt +++ b/Documentation/devicetree/bindings/mips/lantiq/rcu.txt @@ -61,7 +61,6 @@ Example of the RCU bindings on a xRX200 SoC: usb_phy0: usb2-phy@18 { compatible = "lantiq,xrx200-usb2-phy"; reg = <0x18 4>, <0x38 4>; - status = "disabled"; resets = <&reset1 4 4>, <&reset0 4 4>; reset-names = "phy", "ctrl"; @@ -71,7 +70,6 @@ Example of the RCU bindings on a xRX200 SoC: usb_phy1: usb2-phy@34 { compatible = "lantiq,xrx200-usb2-phy"; reg = <0x34 4>, <0x3C 4>; - status = "disabled"; resets = <&reset1 5 4>, <&reset0 4 4>; reset-names = "phy", "ctrl"; diff --git a/Documentation/devicetree/bindings/mmc/amlogic,meson-gx.txt b/Documentation/devicetree/bindings/mmc/amlogic,meson-gx.txt index 50bf611a4d2c01a3488c538a17334af3d1706093..13e70409e8ac6250676eba2d6e31be6187410168 100644 --- a/Documentation/devicetree/bindings/mmc/amlogic,meson-gx.txt +++ b/Documentation/devicetree/bindings/mmc/amlogic,meson-gx.txt @@ -12,6 +12,7 @@ Required properties: - "amlogic,meson-gxbb-mmc" - "amlogic,meson-gxl-mmc" - "amlogic,meson-gxm-mmc" + - "amlogic,meson-axg-mmc" - clocks : A list of phandle + clock-specifier pairs for the clocks listed in clock-names. - clock-names: Should contain the following: "core" - Main peripheral bus clock @@ -19,6 +20,7 @@ Required properties: "clkin1" - Other parent clock of internal mux The driver has an internal mux clock which switches between clkin0 and clkin1 depending on the clock rate requested by the MMC core. +- resets : phandle of the internal reset line Example: @@ -29,4 +31,5 @@ Example: clocks = <&clkc CLKID_SD_EMMC_A>, <&xtal>, <&clkc CLKID_FCLK_DIV2>; clock-names = "core", "clkin0", "clkin1"; pinctrl-0 = <&emmc_pins>; + resets = <&reset RESET_SD_EMMC_A>; }; diff --git a/Documentation/devicetree/bindings/mmc/bluefield-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/bluefield-dw-mshc.txt new file mode 100644 index 0000000000000000000000000000000000000000..b0f0999ea1a9681317c097674dc70e97e80709bd --- /dev/null +++ b/Documentation/devicetree/bindings/mmc/bluefield-dw-mshc.txt @@ -0,0 +1,29 @@ +* Mellanox Bluefield SoC specific extensions to the Synopsys Designware + Mobile Storage Host Controller + +Read synopsys-dw-mshc.txt for more details + +The Synopsys designware mobile storage host controller is used to interface +a SoC with storage medium such as eMMC or SD/MMC cards. This file documents +differences between the core Synopsys dw mshc controller properties described +by synopsys-dw-mshc.txt and the properties used by the Mellanox Bluefield SoC +specific extensions to the Synopsys Designware Mobile Storage Host Controller. + +Required Properties: + +* compatible: should be one of the following. + - "mellanox,bluefield-dw-mshc": for controllers with Mellanox Bluefield SoC + specific extensions. + +Example: + + /* Mellanox Bluefield SoC MMC */ + mmc@6008000 { + compatible = "mellanox,bluefield-dw-mshc"; + reg = <0x6008000 0x400>; + interrupts = <32>; + fifo-depth = <0x100>; + clock-frequency = <24000000>; + bus-width = <8>; + cap-mmc-highspeed; + }; diff --git a/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt b/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt index a58c173b7ab9882091fe26e378ce655610f17455..0419a63f73a01457a88e79bb9b61ff4b606fd011 100644 --- a/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt +++ b/Documentation/devicetree/bindings/mmc/exynos-dw-mshc.txt @@ -62,7 +62,7 @@ Required properties for a slot (Deprecated - Recommend to use one slot per host) rest of the gpios (depending on the bus-width property) are the data lines in no particular order. The format of the gpio specifier depends on the gpio controller. -(Deprecated - Refer to Documentation/devicetree/binding/pinctrl/samsung-pinctrl.txt) +(Deprecated - Refer to Documentation/devicetree/bindings/pinctrl/samsung-pinctrl.txt) Example: diff --git a/Documentation/devicetree/bindings/mmc/jz4740.txt b/Documentation/devicetree/bindings/mmc/jz4740.txt new file mode 100644 index 0000000000000000000000000000000000000000..7cd8c432d7c8794eea64570aa233ea615432a6f5 --- /dev/null +++ b/Documentation/devicetree/bindings/mmc/jz4740.txt @@ -0,0 +1,38 @@ +* Ingenic JZ47xx MMC controllers + +This file documents the device tree properties used for the MMC controller in +Ingenic JZ4740/JZ4780 SoCs. These are in addition to the core MMC properties +described in mmc.txt. + +Required properties: +- compatible: Should be one of the following: + - "ingenic,jz4740-mmc" for the JZ4740 + - "ingenic,jz4780-mmc" for the JZ4780 +- reg: Should contain the MMC controller registers location and length. +- interrupts: Should contain the interrupt specifier of the MMC controller. +- clocks: Clock for the MMC controller. + +Optional properties: +- dmas: List of DMA specifiers with the controller specific format + as described in the generic DMA client binding. A tx and rx + specifier is required. +- dma-names: RX and TX DMA request names. + Should be "rx" and "tx", in that order. + +For additional details on DMA client bindings see ../dma/dma.txt. + +Example: + +mmc0: mmc@13450000 { + compatible = "ingenic,jz4780-mmc"; + reg = <0x13450000 0x1000>; + + interrupt-parent = <&intc>; + interrupts = <37>; + + clocks = <&cgu JZ4780_CLK_MSC0>; + clock-names = "mmc"; + + dmas = <&dma JZ4780_DMA_MSC0_RX 0xffffffff>, <&dma JZ4780_DMA_MSC0_TX 0xffffffff>; + dma-names = "rx", "tx"; +}; diff --git a/Documentation/devicetree/bindings/mmc/microchip,sdhci-pic32.txt b/Documentation/devicetree/bindings/mmc/microchip,sdhci-pic32.txt index 3149297b3933293c35647ef8ae215684f2d1fc5c..f064528effed31f30d1d1c6e0b49c02e215d99af 100644 --- a/Documentation/devicetree/bindings/mmc/microchip,sdhci-pic32.txt +++ b/Documentation/devicetree/bindings/mmc/microchip,sdhci-pic32.txt @@ -12,7 +12,7 @@ Required properties: See: Documentation/devicetree/bindings/clock/clock-bindings.txt - pinctrl-names: A pinctrl state names "default" must be defined. - pinctrl-0: Phandle referencing pin configuration of the SDHCI controller. - See: Documentation/devicetree/bindings/pinctrl/pinctrl-binding.txt + See: Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt Example: diff --git a/Documentation/devicetree/bindings/mmc/mmc.txt b/Documentation/devicetree/bindings/mmc/mmc.txt index 467cd7b147ced43de21bbe12c8ad5d414f758b85..f5a0923b34ca1e5dfd11f3c9ba03792a6963b5c7 100644 --- a/Documentation/devicetree/bindings/mmc/mmc.txt +++ b/Documentation/devicetree/bindings/mmc/mmc.txt @@ -19,6 +19,8 @@ Optional properties: - wp-gpios: Specify GPIOs for write protection, see gpio binding - cd-inverted: when present, polarity on the CD line is inverted. See the note below for the case, when a GPIO is used for the CD line +- cd-debounce-delay-ms: Set delay time before detecting card after card insert interrupt. + It's only valid when cd-gpios is present. - wp-inverted: when present, polarity on the WP line is inverted. See the note below for the case, when a GPIO is used for the WP line - disable-wp: When set no physical WP line is present. This property should @@ -56,6 +58,10 @@ Optional properties: - fixed-emmc-driver-type: for non-removable eMMC, enforce this driver type. The value is the driver type as specified in the eMMC specification (table 206 in spec version 5.1). +- post-power-on-delay-ms : It was invented for MMC pwrseq-simple which could + be referred to mmc-pwrseq-simple.txt. But now it's reused as a tunable delay + waiting for I/O signalling and card power supply to be stable, regardless of + whether pwrseq-simple is used. Default to 10ms if no available. *NOTE* on CD and WP polarity. To use common for all SD/MMC host controllers line polarity properties, we have to fix the meaning of the "normal" and "inverted" diff --git a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt index 51775a372c0651eca1d063d5af0737cc7eac49bc..393848c2138e86f5cc026f1fd04cc68e85916c79 100644 --- a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt +++ b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt @@ -4,7 +4,14 @@ Refer to mmc.txt for standard MMC bindings. Required properties: - compatible: Should be "ti,dra7-sdhci" for DRA7 and DRA72 controllers + Should be "ti,k2g-sdhci" for K2G - ti,hwmods: Must be "mmc", is controller instance starting 1 + (Not required for K2G). +- pinctrl-names: Should be subset of "default", "hs", "sdr12", "sdr25", "sdr50", + "ddr50-rev11", "sdr104-rev11", "ddr50", "sdr104", + "ddr_1_8v-rev11", "ddr_1_8v" or "ddr_3_3v", "hs200_1_8v-rev11", + "hs200_1_8v", +- pinctrl- : Pinctrl states as described in bindings/pinctrl/pinctrl-bindings.txt Example: mmc1: mmc@4809c000 { diff --git a/Documentation/devicetree/bindings/mmc/sdhci-st.txt b/Documentation/devicetree/bindings/mmc/sdhci-st.txt index 6b3d40ca395eb47c59696766f45cf22a6fc8de77..ccf82b4ee838e0c86f5bfef9b1f14b19d3323fc4 100644 --- a/Documentation/devicetree/bindings/mmc/sdhci-st.txt +++ b/Documentation/devicetree/bindings/mmc/sdhci-st.txt @@ -20,7 +20,7 @@ Required properties: - pinctrl-names: A pinctrl state names "default" must be defined. - pinctrl-0: Phandle referencing pin configuration of the sd/emmc controller. - See: Documentation/devicetree/bindings/pinctrl/pinctrl-binding.txt + See: Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt - reg: This must provide the host controller base address and it can also contain the FlashSS Top register for TX/RX delay used by the driver diff --git a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt index 2d5287eeed953b93314c91a9560db28177e46867..839f469f4525bbe2799c41798e30a22d0377b34d 100644 --- a/Documentation/devicetree/bindings/mmc/tmio_mmc.txt +++ b/Documentation/devicetree/bindings/mmc/tmio_mmc.txt @@ -26,6 +26,8 @@ Required properties: "renesas,sdhi-r8a7794" - SDHI IP on R8A7794 SoC "renesas,sdhi-r8a7795" - SDHI IP on R8A7795 SoC "renesas,sdhi-r8a7796" - SDHI IP on R8A7796 SoC + "renesas,sdhi-r8a77965" - SDHI IP on R8A77965 SoC + "renesas,sdhi-r8a77980" - SDHI IP on R8A77980 SoC "renesas,sdhi-r8a77995" - SDHI IP on R8A77995 SoC "renesas,sdhi-shmobile" - a generic sh-mobile SDHI controller "renesas,rcar-gen1-sdhi" - a generic R-Car Gen1 SDHI controller @@ -67,7 +69,6 @@ Example: R8A7790 (R-Car H2) SDHI controller nodes max-frequency = <195000000>; power-domains = <&sysc R8A7790_PD_ALWAYS_ON>; resets = <&cpg 314>; - status = "disabled"; }; sdhi1: sd@ee120000 { @@ -81,7 +82,6 @@ Example: R8A7790 (R-Car H2) SDHI controller nodes max-frequency = <195000000>; power-domains = <&sysc R8A7790_PD_ALWAYS_ON>; resets = <&cpg 313>; - status = "disabled"; }; sdhi2: sd@ee140000 { @@ -95,7 +95,6 @@ Example: R8A7790 (R-Car H2) SDHI controller nodes max-frequency = <97500000>; power-domains = <&sysc R8A7790_PD_ALWAYS_ON>; resets = <&cpg 312>; - status = "disabled"; }; sdhi3: sd@ee160000 { @@ -109,5 +108,4 @@ Example: R8A7790 (R-Car H2) SDHI controller nodes max-frequency = <97500000>; power-domains = <&sysc R8A7790_PD_ALWAYS_ON>; resets = <&cpg 311>; - status = "disabled"; }; diff --git a/Documentation/devicetree/bindings/mtd/gpmi-nand.txt b/Documentation/devicetree/bindings/mtd/gpmi-nand.txt index b289ef3c1b7e4a8ae798fc1b7f7db38880b3df23..393588385c6e50cd2e8dbf0469f0ba17efaab1e8 100644 --- a/Documentation/devicetree/bindings/mtd/gpmi-nand.txt +++ b/Documentation/devicetree/bindings/mtd/gpmi-nand.txt @@ -47,6 +47,11 @@ Optional properties: partitions written from Linux with this feature turned on may not be accessible by the BootROM code. + - nand-ecc-strength: integer representing the number of bits to correct + per ECC step. Needs to be a multiple of 2. + - nand-ecc-step-size: integer representing the number of data bytes + that are covered by a single ECC step. The driver + supports 512 and 1024. The device tree may optionally contain sub-nodes describing partitions of the address space. See partition.txt for more detail. diff --git a/Documentation/devicetree/bindings/powerpc/4xx/ndfc.txt b/Documentation/devicetree/bindings/mtd/ibm,ndfc.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/4xx/ndfc.txt rename to Documentation/devicetree/bindings/mtd/ibm,ndfc.txt diff --git a/Documentation/devicetree/bindings/mtd/mtk-nand.txt b/Documentation/devicetree/bindings/mtd/mtk-nand.txt index 1c88526dedfc2a370d0a8ccf13c4247c63dacb78..4d3ec5e4ff8a313d7e359e1d119e70dbf3643d8e 100644 --- a/Documentation/devicetree/bindings/mtd/mtk-nand.txt +++ b/Documentation/devicetree/bindings/mtd/mtk-nand.txt @@ -20,7 +20,6 @@ Required NFI properties: - interrupts: Interrupts of NFI. - clocks: NFI required clocks. - clock-names: NFI clocks internal name. -- status: Disabled default. Then set "okay" by platform. - ecc-engine: Required ECC Engine node. - #address-cells: NAND chip index, should be 1. - #size-cells: Should be 0. @@ -34,7 +33,6 @@ Example: clocks = <&pericfg CLK_PERI_NFI>, <&pericfg CLK_PERI_NFI_PAD>; clock-names = "nfi_clk", "pad_clk"; - status = "disabled"; ecc-engine = <&bch>; #address-cells = <1>; #size-cells = <0>; @@ -50,14 +48,19 @@ Optional: - nand-on-flash-bbt: Store BBT on NAND Flash. - nand-ecc-mode: the NAND ecc mode (check driver for supported modes) - nand-ecc-step-size: Number of data bytes covered by a single ECC step. - valid values: 512 and 1024. + valid values: + 512 and 1024 on mt2701 and mt2712. + 512 only on mt7622. 1024 is recommended for large page NANDs. - nand-ecc-strength: Number of bits to correct per ECC step. - The valid values that the controller supports are: 4, 6, - 8, 10, 12, 14, 16, 18, 20, 22, 24, 28, 32, 36, 40, 44, - 48, 52, 56, 60. + The valid values that each controller supports: + mt2701: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 28, + 32, 36, 40, 44, 48, 52, 56, 60. + mt2712: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 28, + 32, 36, 40, 44, 48, 52, 56, 60, 68, 72, 80. + mt7622: 4, 6, 8, 10, 12, 14, 16. The strength should be calculated as follows: - E = (S - F) * 8 / 14 + E = (S - F) * 8 / B S = O / (P / Q) E : nand-ecc-strength. S : spare size per sector. @@ -66,6 +69,15 @@ Optional: O : oob size. P : page size. Q : nand-ecc-step-size. + B : number of parity bits needed to correct + 1 bitflip. + According to MTK NAND controller design, + this number depends on max ecc step size + that MTK NAND controller supports. + If max ecc step size supported is 1024, + then it should be always 14. And if max + ecc step size is 512, then it should be + always 13. If the result does not match any one of the listed choices above, please select the smaller valid value from the list. @@ -152,7 +164,6 @@ Required BCH properties: - interrupts: Interrupts of ECC. - clocks: ECC required clocks. - clock-names: ECC clocks internal name. -- status: Disabled default. Then set "okay" by platform. Example: @@ -162,5 +173,4 @@ Example: interrupts = ; clocks = <&pericfg CLK_PERI_NFI_ECC>; clock-names = "nfiecc_clk"; - status = "disabled"; }; diff --git a/Documentation/devicetree/bindings/mtd/partition.txt b/Documentation/devicetree/bindings/mtd/partition.txt index 36f3b769a62675cccac1a23ec8287e08bd91653e..a8f382642ba9613764ffee288596f7db65446758 100644 --- a/Documentation/devicetree/bindings/mtd/partition.txt +++ b/Documentation/devicetree/bindings/mtd/partition.txt @@ -14,7 +14,7 @@ method is used for a given flash device. To describe the method there should be a subnode of the flash device that is named 'partitions'. It must have a 'compatible' property, which is used to identify the method to use. -We currently only document a binding for fixed layouts. +Available bindings are listed in the "partitions" subdirectory. Fixed Partitions diff --git a/Documentation/devicetree/bindings/mtd/partitions/brcm,bcm947xx-cfe-partitions.txt b/Documentation/devicetree/bindings/mtd/partitions/brcm,bcm947xx-cfe-partitions.txt new file mode 100644 index 0000000000000000000000000000000000000000..1d61a029395e76eac9b5cd90da7b44f63c514c17 --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/partitions/brcm,bcm947xx-cfe-partitions.txt @@ -0,0 +1,42 @@ +Broadcom BCM47xx Partitions +=========================== + +Broadcom is one of hardware manufacturers providing SoCs (BCM47xx) used in +home routers. Their BCM947xx boards using CFE bootloader have several partitions +without any on-flash partition table. On some devices their sizes and/or +meanings can also vary so fixed partitioning can't be used. + +Discovering partitions on these devices is possible thanks to having a special +header and/or magic signature at the beginning of each of them. They are also +block aligned which is important for determinig a size. + +Most of partitions use ASCII text based magic for determining a type. More +complex partitions (like TRX with its HDR0 magic) may include extra header +containing some details, including a length. + +A list of supported partitions includes: +1) Bootloader with Broadcom's CFE (Common Firmware Environment) +2) NVRAM with configuration/calibration data +3) Device manufacturer's data with some default values (e.g. SSIDs) +4) TRX firmware container which can hold up to 4 subpartitions +5) Backup TRX firmware used after failed upgrade + +As mentioned earlier, role of some partitions may depend on extra configuration. +For example both: main firmware and backup firmware use the same TRX format with +the same header. To distinguish currently used firmware a CFE's environment +variable "bootpartition" is used. + + +Devices using Broadcom partitions described above should should have flash node +with a subnode named "partitions" using following properties: + +Required properties: +- compatible : (required) must be "brcm,bcm947xx-cfe-partitions" + +Example: + +flash@0 { + partitions { + compatible = "brcm,bcm947xx-cfe-partitions"; + }; +}; diff --git a/Documentation/devicetree/bindings/mtd/sunxi-nand.txt b/Documentation/devicetree/bindings/mtd/sunxi-nand.txt index 0734f03bf3d311d051a88dde9102a210206690e4..dcd5a5d80dc0ad535298f58d492607507752c02b 100644 --- a/Documentation/devicetree/bindings/mtd/sunxi-nand.txt +++ b/Documentation/devicetree/bindings/mtd/sunxi-nand.txt @@ -22,8 +22,6 @@ Optional properties: - reset : phandle + reset specifier pair - reset-names : must contain "ahb" - allwinner,rb : shall contain the native Ready/Busy ids. - or -- rb-gpios : shall contain the gpios used as R/B pins. - nand-ecc-mode : one of the supported ECC modes ("hw", "soft", "soft_bch" or "none") diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt index cfe8f64eca4fbea222f53cfd78b62d7df4e10020..3ceeb8de11963572cc1bd8ce324433e0dbf6bd03 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa.txt +++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt @@ -82,8 +82,6 @@ linked into one DSA cluster. switch0: switch0@0 { compatible = "marvell,mv88e6085"; - #address-cells = <1>; - #size-cells = <0>; reg = <0>; dsa,member = <0 0>; @@ -135,8 +133,6 @@ linked into one DSA cluster. switch1: switch1@0 { compatible = "marvell,mv88e6085"; - #address-cells = <1>; - #size-cells = <0>; reg = <0>; dsa,member = <0 1>; @@ -204,8 +200,6 @@ linked into one DSA cluster. switch2: switch2@0 { compatible = "marvell,mv88e6085"; - #address-cells = <1>; - #size-cells = <0>; reg = <0>; dsa,member = <0 2>; diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt b/Documentation/devicetree/bindings/net/dsa/ksz.txt index fd23904ac68e929adcc6657b6cde29ea4206fd7e..a700943218cac9892a58df1c51b603e5003d029f 100644 --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt @@ -6,7 +6,7 @@ Required properties: - compatible: For external switch chips, compatible string must be exactly one of: "microchip,ksz9477" -See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional +See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of additional required and optional properties. Examples: diff --git a/Documentation/devicetree/bindings/net/dsa/mt7530.txt b/Documentation/devicetree/bindings/net/dsa/mt7530.txt index a9bc27b93ee326fe2ac665ffc143b0e3e345c0a0..aa3527f71fdc7ee9377b95fba69adad5d2264955 100644 --- a/Documentation/devicetree/bindings/net/dsa/mt7530.txt +++ b/Documentation/devicetree/bindings/net/dsa/mt7530.txt @@ -31,7 +31,7 @@ Required properties for the child nodes within ports container: - phy-mode: String, must be either "trgmii" or "rgmii" for port labeled "cpu". -See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional +See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list of additional required, optional properties and how the integrated switch subnodes must be specified. diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt index 3d6d5fa0c4d5fc670d8bec94e53a0f07a0a2e013..cfe724398a12b68204abe29e5989547a7066b01a 100644 --- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt +++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt @@ -7,6 +7,7 @@ Required properties: - compatible: must be one of the following string: "allwinner,sun8i-a83t-emac" "allwinner,sun8i-h3-emac" + "allwinner,sun8i-r40-gmac" "allwinner,sun8i-v3s-emac" "allwinner,sun50i-a64-emac" - reg: address and length of the register for the device. @@ -20,18 +21,18 @@ Required properties: - phy-handle: See ethernet.txt - #address-cells: shall be 1 - #size-cells: shall be 0 -- syscon: A phandle to the syscon of the SoC with one of the following - compatible string: - - allwinner,sun8i-h3-system-controller - - allwinner,sun8i-v3s-system-controller - - allwinner,sun50i-a64-system-controller - - allwinner,sun8i-a83t-system-controller +- syscon: A phandle to the device containing the EMAC or GMAC clock register Optional properties: -- allwinner,tx-delay-ps: TX clock delay chain value in ps. Range value is 0-700. Default is 0) -- allwinner,rx-delay-ps: RX clock delay chain value in ps. Range value is 0-3100. Default is 0) -Both delay properties need to be a multiple of 100. They control the delay for -external PHY. +- allwinner,tx-delay-ps: TX clock delay chain value in ps. + Range is 0-700. Default is 0. + Unavailable for allwinner,sun8i-r40-gmac +- allwinner,rx-delay-ps: RX clock delay chain value in ps. + Range is 0-3100. Default is 0. + Range is 0-700 for allwinner,sun8i-r40-gmac +Both delay properties need to be a multiple of 100. They control the +clock delay for external RGMII PHY. They do not apply to the internal +PHY or external non-RGMII PHYs. Optional properties for the following compatibles: - "allwinner,sun8i-h3-emac", diff --git a/Documentation/devicetree/bindings/net/fsl-fman.txt b/Documentation/devicetree/bindings/net/fsl-fman.txt index df873d1f3b7c598b6c30721d3eec915a20ea8621..f8c33890bc2970e08bf44934835a9b8c464675f1 100644 --- a/Documentation/devicetree/bindings/net/fsl-fman.txt +++ b/Documentation/devicetree/bindings/net/fsl-fman.txt @@ -238,7 +238,7 @@ PROPERTIES Must include one of the following: - "fsl,fman-dtsec" for dTSEC MAC - "fsl,fman-xgec" for XGEC MAC - - "fsl,fman-memac for mEMAC MAC + - "fsl,fman-memac" for mEMAC MAC - cell-index Usage: required diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt index 79bf352e659cd1b999603317a62a7ec62fbac0b6..047bdf7bdd2fa3e6dfad8ab957f478fbebb794a1 100644 --- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt +++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt @@ -86,70 +86,4 @@ Example: * Gianfar PTP clock nodes -General Properties: - - - compatible Should be "fsl,etsec-ptp" - - reg Offset and length of the register set for the device - - interrupts There should be at least two interrupts. Some devices - have as many as four PTP related interrupts. - -Clock Properties: - - - fsl,cksel Timer reference clock source. - - fsl,tclk-period Timer reference clock period in nanoseconds. - - fsl,tmr-prsc Prescaler, divides the output clock. - - fsl,tmr-add Frequency compensation value. - - fsl,tmr-fiper1 Fixed interval period pulse generator. - - fsl,tmr-fiper2 Fixed interval period pulse generator. - - fsl,max-adj Maximum frequency adjustment in parts per billion. - - These properties set the operational parameters for the PTP - clock. You must choose these carefully for the clock to work right. - Here is how to figure good values: - - TimerOsc = selected reference clock MHz - tclk_period = desired clock period nanoseconds - NominalFreq = 1000 / tclk_period MHz - FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) - tmr_add = ceil(2^32 / FreqDivRatio) - OutputClock = NominalFreq / tmr_prsc MHz - PulseWidth = 1 / OutputClock microseconds - FiperFreq1 = desired frequency in Hz - FiperDiv1 = 1000000 * OutputClock / FiperFreq1 - tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period - max_adj = 1000000000 * (FreqDivRatio - 1.0) - 1 - - The calculation for tmr_fiper2 is the same as for tmr_fiper1. The - driver expects that tmr_fiper1 will be correctly set to produce a 1 - Pulse Per Second (PPS) signal, since this will be offered to the PPS - subsystem to synchronize the Linux clock. - - Reference clock source is determined by the value, which is holded - in CKSEL bits in TMR_CTRL register. "fsl,cksel" property keeps the - value, which will be directly written in those bits, that is why, - according to reference manual, the next clock sources can be used: - - <0> - external high precision timer reference clock (TSEC_TMR_CLK - input is used for this purpose); - <1> - eTSEC system clock; - <2> - eTSEC1 transmit clock; - <3> - RTC clock input. - - When this attribute is not used, eTSEC system clock will serve as - IEEE 1588 timer reference clock. - -Example: - - ptp_clock@24e00 { - compatible = "fsl,etsec-ptp"; - reg = <0x24E00 0xB0>; - interrupts = <12 0x8 13 0x8>; - interrupt-parent = < &ipic >; - fsl,cksel = <1>; - fsl,tclk-period = <10>; - fsl,tmr-prsc = <100>; - fsl,tmr-add = <0x999999A4>; - fsl,tmr-fiper1 = <0x3B9AC9F6>; - fsl,tmr-fiper2 = <0x00018696>; - fsl,max-adj = <659999998>; - }; +Refer to Documentation/devicetree/bindings/ptp/ptp-qoriq.txt diff --git a/Documentation/devicetree/bindings/powerpc/4xx/emac.txt b/Documentation/devicetree/bindings/net/ibm,emac.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/4xx/emac.txt rename to Documentation/devicetree/bindings/net/ibm,emac.txt diff --git a/Documentation/devicetree/bindings/net/microchip,lan78xx.txt b/Documentation/devicetree/bindings/net/microchip,lan78xx.txt new file mode 100644 index 0000000000000000000000000000000000000000..76786a0f6d3d7c06d5ce09ba15c311621def50cf --- /dev/null +++ b/Documentation/devicetree/bindings/net/microchip,lan78xx.txt @@ -0,0 +1,54 @@ +Microchip LAN78xx Gigabit Ethernet controller + +The LAN78XX devices are usually configured by programming their OTP or with +an external EEPROM, but some platforms (e.g. Raspberry Pi 3 B+) have neither. +The Device Tree properties, if present, override the OTP and EEPROM. + +Required properties: +- compatible: Should be one of "usb424,7800", "usb424,7801" or "usb424,7850". + +Optional properties: +- local-mac-address: see ethernet.txt +- mac-address: see ethernet.txt + +Optional properties of the embedded PHY: +- microchip,led-modes: a 0..4 element vector, with each element configuring + the operating mode of an LED. Omitted LEDs are turned off. Allowed values + are defined in "include/dt-bindings/net/microchip-lan78xx.h". + +Example: + +/* Based on the configuration for a Raspberry Pi 3 B+ */ +&usb { + usb-port@1 { + compatible = "usb424,2514"; + reg = <1>; + #address-cells = <1>; + #size-cells = <0>; + + usb-port@1 { + compatible = "usb424,2514"; + reg = <1>; + #address-cells = <1>; + #size-cells = <0>; + + ethernet: ethernet@1 { + compatible = "usb424,7800"; + reg = <1>; + local-mac-address = [ 00 11 22 33 44 55 ]; + + mdio { + #address-cells = <0x1>; + #size-cells = <0x0>; + eth_phy: ethernet-phy@1 { + reg = <1>; + microchip,led-modes = < + LAN78XX_LINK_1000_ACTIVITY + LAN78XX_LINK_10_100_ACTIVITY + >; + }; + }; + }; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/net/mscc-miim.txt b/Documentation/devicetree/bindings/net/mscc-miim.txt new file mode 100644 index 0000000000000000000000000000000000000000..7104679cf59d5ecde0c7d492daecf59fa7be8e29 --- /dev/null +++ b/Documentation/devicetree/bindings/net/mscc-miim.txt @@ -0,0 +1,26 @@ +Microsemi MII Management Controller (MIIM) / MDIO +================================================= + +Properties: +- compatible: must be "mscc,ocelot-miim" +- reg: The base address of the MDIO bus controller register bank. Optionally, a + second register bank can be defined if there is an associated reset register + for internal PHYs +- #address-cells: Must be <1>. +- #size-cells: Must be <0>. MDIO addresses have no size component. +- interrupts: interrupt specifier (refer to the interrupt binding) + +Typically an MDIO bus might have several children. + +Example: + mdio@107009c { + #address-cells = <1>; + #size-cells = <0>; + compatible = "mscc,ocelot-miim"; + reg = <0x107009c 0x36>, <0x10700f0 0x8>; + interrupts = <14>; + + phy0: ethernet-phy@0 { + reg = <0>; + }; + }; diff --git a/Documentation/devicetree/bindings/net/mscc-ocelot.txt b/Documentation/devicetree/bindings/net/mscc-ocelot.txt new file mode 100644 index 0000000000000000000000000000000000000000..0a84711abece92049c9768bddb5fb95e5f6d812b --- /dev/null +++ b/Documentation/devicetree/bindings/net/mscc-ocelot.txt @@ -0,0 +1,82 @@ +Microsemi Ocelot network Switch +=============================== + +The Microsemi Ocelot network switch can be found on Microsemi SoCs (VSC7513, +VSC7514) + +Required properties: +- compatible: Should be "mscc,vsc7514-switch" +- reg: Must contain an (offset, length) pair of the register set for each + entry in reg-names. +- reg-names: Must include the following entries: + - "sys" + - "rew" + - "qs" + - "hsio" + - "qsys" + - "ana" + - "portX" with X from 0 to the number of last port index available on that + switch +- interrupts: Should contain the switch interrupts for frame extraction and + frame injection +- interrupt-names: should contain the interrupt names: "xtr", "inj" +- ethernet-ports: A container for child nodes representing switch ports. + +The ethernet-ports container has the following properties + +Required properties: + +- #address-cells: Must be 1 +- #size-cells: Must be 0 + +Each port node must have the following mandatory properties: +- reg: Describes the port address in the switch + +Port nodes may also contain the following optional standardised +properties, described in binding documents: + +- phy-handle: Phandle to a PHY on an MDIO bus. See + Documentation/devicetree/bindings/net/ethernet.txt for details. + +Example: + + switch@1010000 { + compatible = "mscc,vsc7514-switch"; + reg = <0x1010000 0x10000>, + <0x1030000 0x10000>, + <0x1080000 0x100>, + <0x10d0000 0x10000>, + <0x11e0000 0x100>, + <0x11f0000 0x100>, + <0x1200000 0x100>, + <0x1210000 0x100>, + <0x1220000 0x100>, + <0x1230000 0x100>, + <0x1240000 0x100>, + <0x1250000 0x100>, + <0x1260000 0x100>, + <0x1270000 0x100>, + <0x1280000 0x100>, + <0x1800000 0x80000>, + <0x1880000 0x10000>; + reg-names = "sys", "rew", "qs", "hsio", "port0", + "port1", "port2", "port3", "port4", "port5", + "port6", "port7", "port8", "port9", "port10", + "qsys", "ana"; + interrupts = <21 22>; + interrupt-names = "xtr", "inj"; + + ethernet-ports { + #address-cells = <1>; + #size-cells = <0>; + + port0: port@0 { + reg = <0>; + phy-handle = <&phy0>; + }; + port1: port@1 { + reg = <1>; + phy-handle = <&phy1>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt new file mode 100644 index 0000000000000000000000000000000000000000..0ea18a53cc29d03cc3c42e07185022587e3e49bb --- /dev/null +++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt @@ -0,0 +1,30 @@ +Qualcomm Bluetooth Chips +--------------------- + +This documents the binding structure and common properties for serial +attached Qualcomm devices. + +Serial attached Qualcomm devices shall be a child node of the host UART +device the slave device is attached to. + +Required properties: + - compatible: should contain one of the following: + * "qcom,qca6174-bt" + +Optional properties: + - enable-gpios: gpio specifier used to enable chip + - clocks: clock provided to the controller (SUSCLK_32KHZ) + +Example: + +serial@7570000 { + label = "BT-UART"; + status = "okay"; + + bluetooth { + compatible = "qcom,qca6174-bt"; + + enable-gpios = <&pm8994_gpios 19 GPIO_ACTIVE_HIGH>; + clocks = <&divclk4>; + }; +}; diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt index 890526dbfc26478f623a8e0b7f78d966509171e7..fac897d54423c11e60ce1438e3ebd68c69db9126 100644 --- a/Documentation/devicetree/bindings/net/renesas,ravb.txt +++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt @@ -21,6 +21,7 @@ Required properties: - "renesas,etheravb-r8a77965" for the R8A77965 SoC. - "renesas,etheravb-r8a77970" for the R8A77970 SoC. - "renesas,etheravb-r8a77980" for the R8A77980 SoC. + - "renesas,etheravb-r8a77990" for the R8A77990 SoC. - "renesas,etheravb-r8a77995" for the R8A77995 SoC. - "renesas,etheravb-rcar-gen3" as a fallback for the above R-Car Gen3 devices. diff --git a/Documentation/devicetree/bindings/net/sff,sfp.txt b/Documentation/devicetree/bindings/net/sff,sfp.txt index 929591d52ed6670c321e3790cdc7d8c8b9787efa..832139919f20a38150c794b811839faa454c840e 100644 --- a/Documentation/devicetree/bindings/net/sff,sfp.txt +++ b/Documentation/devicetree/bindings/net/sff,sfp.txt @@ -7,11 +7,11 @@ Required properties: "sff,sfp" for SFP modules "sff,sff" for soldered down SFF modules -Optional Properties: - - i2c-bus : phandle of an I2C bus controller for the SFP two wire serial interface +Optional Properties: + - mod-def0-gpios : GPIO phandle and a specifier of the MOD-DEF0 (AKA Mod_ABS) module presence input gpio signal, active (module absent) high. Must not be present for SFF modules diff --git a/Documentation/devicetree/bindings/net/sh_eth.txt b/Documentation/devicetree/bindings/net/sh_eth.txt index 5172799a7f1a478d4eb0e834a8672806d87509e7..82a4cf2c145dd6552b71630ac8960f6248eb3bdb 100644 --- a/Documentation/devicetree/bindings/net/sh_eth.txt +++ b/Documentation/devicetree/bindings/net/sh_eth.txt @@ -14,6 +14,7 @@ Required properties: "renesas,ether-r8a7791" if the device is a part of R8A7791 SoC. "renesas,ether-r8a7793" if the device is a part of R8A7793 SoC. "renesas,ether-r8a7794" if the device is a part of R8A7794 SoC. + "renesas,gether-r8a77980" if the device is a part of R8A77980 SoC. "renesas,ether-r7s72100" if the device is a part of R7S72100 SoC. "renesas,rcar-gen1-ether" for a generic R-Car Gen1 device. "renesas,rcar-gen2-ether" for a generic R-Car Gen2 or RZ/G1 diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt index 96398cc2982f82b0673f559999f193b2d2894a7f..fc8f01718690d2cbbbf271cbb0d3117ea8c6041c 100644 --- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt +++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt @@ -13,13 +13,25 @@ Required properties: - reg: Address where registers are mapped and size of region. - interrupts: Should contain the MAC interrupt. - phy-mode: See ethernet.txt in the same directory. Allow to choose - "rgmii", "rmii", or "mii" according to the PHY. + "rgmii", "rmii", "mii", or "internal" according to the PHY. + The acceptable mode is SoC-dependent. - phy-handle: Should point to the external phy device. See ethernet.txt file in the same directory. - clocks: A phandle to the clock for the MAC. + For Pro4 SoC, that is "socionext,uniphier-pro4-ave4", + another MAC clock, GIO bus clock and PHY clock are also required. + - clock-names: Should contain + - "ether", "ether-gb", "gio", "ether-phy" for Pro4 SoC + - "ether" for others + - resets: A phandle to the reset control for the MAC. For Pro4 SoC, + GIO bus reset is also required. + - reset-names: Should contain + - "ether", "gio" for Pro4 SoC + - "ether" for others + - socionext,syscon-phy-mode: A phandle to syscon with one argument + that configures phy mode. The argument is the ID of MAC instance. Optional properties: - - resets: A phandle to the reset control for the MAC. - local-mac-address: See ethernet.txt in the same directory. Required subnode: @@ -34,8 +46,11 @@ Example: interrupts = <0 66 4>; phy-mode = "rgmii"; phy-handle = <ðphy>; + clock-names = "ether"; clocks = <&sys_clk 6>; + reset-names = "ether"; resets = <&sys_rst 6>; + socionext,syscon-phy-mode = <&soc_glue 0>; local-mac-address = [00 00 00 00 00 00]; mdio { diff --git a/Documentation/devicetree/bindings/net/stm32-dwmac.txt b/Documentation/devicetree/bindings/net/stm32-dwmac.txt index 489dbcb66c5a273df0d444dee4f0a3a280bf8f83..1341012722aa3b5f6ec1676a7c4950219ddaf9a6 100644 --- a/Documentation/devicetree/bindings/net/stm32-dwmac.txt +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.txt @@ -6,14 +6,28 @@ Please see stmmac.txt for the other unchanged properties. The device node has following properties. Required properties: -- compatible: Should be "st,stm32-dwmac" to select glue, and +- compatible: For MCU family should be "st,stm32-dwmac" to select glue, and "snps,dwmac-3.50a" to select IP version. + For MPU family should be "st,stm32mp1-dwmac" to select + glue, and "snps,dwmac-4.20a" to select IP version. - clocks: Must contain a phandle for each entry in clock-names. - clock-names: Should be "stmmaceth" for the host clock. Should be "mac-clk-tx" for the MAC TX clock. Should be "mac-clk-rx" for the MAC RX clock. + For MPU family need to add also "ethstp" for power mode clock and, + "syscfg-clk" for SYSCFG clock. +- interrupt-names: Should contain a list of interrupt names corresponding to + the interrupts in the interrupts property, if available. + Should be "macirq" for the main MAC IRQ + Should be "eth_wake_irq" for the IT which wake up system - st,syscon : Should be phandle/offset pair. The phandle to the syscon node which - encompases the glue register, and the offset of the control register. + encompases the glue register, and the offset of the control register. + +Optional properties: +- clock-names: For MPU family "mac-clk-ck" for PHY without quartz +- st,int-phyclk (boolean) : valid only where PHY do not have quartz and need to be clock + by RCC + Example: ethernet@40028000 { diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt index 3d2a031217da0851acad1ce8cc1716b376801adc..7fd4e8ce4149a9663ec93f95c9ce5c5dc4741aca 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt @@ -4,6 +4,7 @@ Required properties: - compatible: Should be one of the following: * "qcom,ath10k" * "qcom,ipq4019-wifi" + * "qcom,wcn3990-wifi" PCI based devices uses compatible string "qcom,ath10k" and takes calibration data along with board specific data via "qcom,ath10k-calibration-data". @@ -18,8 +19,12 @@ In general, entry "qcom,ath10k-pre-calibration-data" and "qcom,ath10k-calibration-data" conflict with each other and only one can be provided per device. +SNOC based devices (i.e. wcn3990) uses compatible string "qcom,wcn3990-wifi". + Optional properties: - reg: Address and length of the register set for the device. +- reg-names: Must include the list of following reg names, + "membase" - resets: Must contain an entry for each entry in reset-names. See ../reset/reseti.txt for details. - reset-names: Must include the list of following reset names, @@ -49,6 +54,8 @@ Optional properties: hw versions. - qcom,ath10k-pre-calibration-data : pre calibration data as an array, the length can vary between hw versions. +- -supply: handle to the regulator device tree node + optional "supply-name" is "vdd-0.8-cx-mx". Example (to supply the calibration data alone): @@ -119,3 +126,27 @@ wifi0: wifi@a000000 { qcom,msi_base = <0x40>; qcom,ath10k-pre-calibration-data = [ 01 02 03 ... ]; }; + +Example (to supply wcn3990 SoC wifi block details): + +wifi@18000000 { + compatible = "qcom,wcn3990-wifi"; + reg = <0x18800000 0x800000>; + reg-names = "membase"; + clocks = <&clock_gcc clk_aggre2_noc_clk>; + clock-names = "smmu_aggre2_noc_clk" + interrupts = + <0 130 0 /* CE0 */ >, + <0 131 0 /* CE1 */ >, + <0 132 0 /* CE2 */ >, + <0 133 0 /* CE3 */ >, + <0 134 0 /* CE4 */ >, + <0 135 0 /* CE5 */ >, + <0 136 0 /* CE6 */ >, + <0 137 0 /* CE7 */ >, + <0 138 0 /* CE8 */ >, + <0 139 0 /* CE9 */ >, + <0 140 0 /* CE10 */ >, + <0 141 0 /* CE11 */ >; + vdd-0.8-cx-mx-supply = <&pm8998_l5>; +}; diff --git a/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt b/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt index d69543701d5d58dad79bf41772eb1426d8f4efee..e319fe5e205ac5d32af9726285afe942848f934c 100644 --- a/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt +++ b/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt @@ -4,6 +4,7 @@ Required properties: - compatible: Should be one of the following: "allwinner,sun4i-a10-sid" "allwinner,sun7i-a20-sid" + "allwinner,sun8i-a83t-sid" "allwinner,sun8i-h3-sid" "allwinner,sun50i-a64-sid" diff --git a/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt b/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt new file mode 100644 index 0000000000000000000000000000000000000000..0df79d9e07ec22f50ef7c6641afea3ba6c311aa0 --- /dev/null +++ b/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt @@ -0,0 +1,40 @@ +Zodiac Inflight Innovations RAVE EEPROM Bindings + +RAVE SP EEPROM device is a "MFD cell" device exposing physical EEPROM +attached to RAVE Supervisory Processor. It is expected that its Device +Tree node is specified as a child of the node corresponding to the +parent RAVE SP device (as documented in +Documentation/devicetree/bindings/mfd/zii,rave-sp.txt) + +Required properties: + +- compatible: Should be "zii,rave-sp-eeprom" + +Optional properties: + +- zii,eeprom-name: Unique EEPROM identifier describing its function in the + system. Will be used as created NVMEM deivce's name. + +Data cells: + +Data cells are child nodes of eerpom node, bindings for which are +documented in Documentation/devicetree/bindings/nvmem/nvmem.txt + +Example: + + rave-sp { + compatible = "zii,rave-sp-rdu1"; + current-speed = <38400>; + + eeprom@a4 { + compatible = "zii,rave-sp-eeprom"; + reg = <0xa4 0x4000>; + #address-cells = <1>; + #size-cells = <1>; + zii,eeprom-name = "main-eeprom"; + + wdt_timeout: wdt-timeout@81 { + reg = <0x81 2>; + }; + }; + } diff --git a/Documentation/devicetree/bindings/opp/kryo-cpufreq.txt b/Documentation/devicetree/bindings/opp/kryo-cpufreq.txt new file mode 100644 index 0000000000000000000000000000000000000000..c2127b96805a3e7b3c723fbb432bbcebaacbd545 --- /dev/null +++ b/Documentation/devicetree/bindings/opp/kryo-cpufreq.txt @@ -0,0 +1,680 @@ +Qualcomm Technologies, Inc. KRYO CPUFreq and OPP bindings +=================================== + +In Certain Qualcomm Technologies, Inc. SoCs like apq8096 and msm8996 +that have KRYO processors, the CPU ferequencies subset and voltage value +of each OPP varies based on the silicon variant in use. +Qualcomm Technologies, Inc. Process Voltage Scaling Tables +defines the voltage and frequency value based on the msm-id in SMEM +and speedbin blown in the efuse combination. +The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the SoC +to provide the OPP framework with required information (existing HW bitmap). +This is used to determine the voltage and frequency value for each OPP of +operating-points-v2 table when it is parsed by the OPP framework. + +Required properties: +-------------------- +In 'cpus' nodes: +- operating-points-v2: Phandle to the operating-points-v2 table to use. + +In 'operating-points-v2' table: +- compatible: Should be + - 'operating-points-v2-kryo-cpu' for apq8096 and msm8996. +- nvmem-cells: A phandle pointing to a nvmem-cells node representing the + efuse registers that has information about the + speedbin that is used to select the right frequency/voltage + value pair. + Please refer the for nvmem-cells + bindings Documentation/devicetree/bindings/nvmem/nvmem.txt + and also examples below. + +In every OPP node: +- opp-supported-hw: A single 32 bit bitmap value, representing compatible HW. + Bitmap: + 0: MSM8996 V3, speedbin 0 + 1: MSM8996 V3, speedbin 1 + 2: MSM8996 V3, speedbin 2 + 3: unused + 4: MSM8996 SG, speedbin 0 + 5: MSM8996 SG, speedbin 1 + 6: MSM8996 SG, speedbin 2 + 7-31: unused + +Example 1: +--------- + + cpus { + #address-cells = <2>; + #size-cells = <0>; + + CPU0: cpu@0 { + device_type = "cpu"; + compatible = "qcom,kryo"; + reg = <0x0 0x0>; + enable-method = "psci"; + clocks = <&kryocc 0>; + cpu-supply = <&pm8994_s11_saw>; + operating-points-v2 = <&cluster0_opp>; + #cooling-cells = <2>; + next-level-cache = <&L2_0>; + L2_0: l2-cache { + compatible = "cache"; + cache-level = <2>; + }; + }; + + CPU1: cpu@1 { + device_type = "cpu"; + compatible = "qcom,kryo"; + reg = <0x0 0x1>; + enable-method = "psci"; + clocks = <&kryocc 0>; + cpu-supply = <&pm8994_s11_saw>; + operating-points-v2 = <&cluster0_opp>; + #cooling-cells = <2>; + next-level-cache = <&L2_0>; + }; + + CPU2: cpu@100 { + device_type = "cpu"; + compatible = "qcom,kryo"; + reg = <0x0 0x100>; + enable-method = "psci"; + clocks = <&kryocc 1>; + cpu-supply = <&pm8994_s11_saw>; + operating-points-v2 = <&cluster1_opp>; + #cooling-cells = <2>; + next-level-cache = <&L2_1>; + L2_1: l2-cache { + compatible = "cache"; + cache-level = <2>; + }; + }; + + CPU3: cpu@101 { + device_type = "cpu"; + compatible = "qcom,kryo"; + reg = <0x0 0x101>; + enable-method = "psci"; + clocks = <&kryocc 1>; + cpu-supply = <&pm8994_s11_saw>; + operating-points-v2 = <&cluster1_opp>; + #cooling-cells = <2>; + next-level-cache = <&L2_1>; + }; + + cpu-map { + cluster0 { + core0 { + cpu = <&CPU0>; + }; + + core1 { + cpu = <&CPU1>; + }; + }; + + cluster1 { + core0 { + cpu = <&CPU2>; + }; + + core1 { + cpu = <&CPU3>; + }; + }; + }; + }; + + cluster0_opp: opp_table0 { + compatible = "operating-points-v2-kryo-cpu"; + nvmem-cells = <&speedbin_efuse>; + opp-shared; + + opp-307200000 { + opp-hz = /bits/ 64 <307200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x77>; + clock-latency-ns = <200000>; + }; + opp-384000000 { + opp-hz = /bits/ 64 <384000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-422400000 { + opp-hz = /bits/ 64 <422400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-460800000 { + opp-hz = /bits/ 64 <460800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-480000000 { + opp-hz = /bits/ 64 <480000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-537600000 { + opp-hz = /bits/ 64 <537600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-556800000 { + opp-hz = /bits/ 64 <556800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-614400000 { + opp-hz = /bits/ 64 <614400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-652800000 { + opp-hz = /bits/ 64 <652800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-691200000 { + opp-hz = /bits/ 64 <691200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-729600000 { + opp-hz = /bits/ 64 <729600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-768000000 { + opp-hz = /bits/ 64 <768000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-844800000 { + opp-hz = /bits/ 64 <844800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x77>; + clock-latency-ns = <200000>; + }; + opp-902400000 { + opp-hz = /bits/ 64 <902400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-960000000 { + opp-hz = /bits/ 64 <960000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-979200000 { + opp-hz = /bits/ 64 <979200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1036800000 { + opp-hz = /bits/ 64 <1036800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1056000000 { + opp-hz = /bits/ 64 <1056000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1113600000 { + opp-hz = /bits/ 64 <1113600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1132800000 { + opp-hz = /bits/ 64 <1132800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1190400000 { + opp-hz = /bits/ 64 <1190400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1209600000 { + opp-hz = /bits/ 64 <1209600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1228800000 { + opp-hz = /bits/ 64 <1228800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1286400000 { + opp-hz = /bits/ 64 <1286400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1324800000 { + opp-hz = /bits/ 64 <1324800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x5>; + clock-latency-ns = <200000>; + }; + opp-1363200000 { + opp-hz = /bits/ 64 <1363200000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x72>; + clock-latency-ns = <200000>; + }; + opp-1401600000 { + opp-hz = /bits/ 64 <1401600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x5>; + clock-latency-ns = <200000>; + }; + opp-1440000000 { + opp-hz = /bits/ 64 <1440000000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1478400000 { + opp-hz = /bits/ 64 <1478400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x1>; + clock-latency-ns = <200000>; + }; + opp-1497600000 { + opp-hz = /bits/ 64 <1497600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x4>; + clock-latency-ns = <200000>; + }; + opp-1516800000 { + opp-hz = /bits/ 64 <1516800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1593600000 { + opp-hz = /bits/ 64 <1593600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x71>; + clock-latency-ns = <200000>; + }; + opp-1996800000 { + opp-hz = /bits/ 64 <1996800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x20>; + clock-latency-ns = <200000>; + }; + opp-2188800000 { + opp-hz = /bits/ 64 <2188800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x10>; + clock-latency-ns = <200000>; + }; + }; + + cluster1_opp: opp_table1 { + compatible = "operating-points-v2-kryo-cpu"; + nvmem-cells = <&speedbin_efuse>; + opp-shared; + + opp-307200000 { + opp-hz = /bits/ 64 <307200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x77>; + clock-latency-ns = <200000>; + }; + opp-384000000 { + opp-hz = /bits/ 64 <384000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-403200000 { + opp-hz = /bits/ 64 <403200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-460800000 { + opp-hz = /bits/ 64 <460800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-480000000 { + opp-hz = /bits/ 64 <480000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-537600000 { + opp-hz = /bits/ 64 <537600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-556800000 { + opp-hz = /bits/ 64 <556800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-614400000 { + opp-hz = /bits/ 64 <614400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-652800000 { + opp-hz = /bits/ 64 <652800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-691200000 { + opp-hz = /bits/ 64 <691200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-729600000 { + opp-hz = /bits/ 64 <729600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-748800000 { + opp-hz = /bits/ 64 <748800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-806400000 { + opp-hz = /bits/ 64 <806400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-825600000 { + opp-hz = /bits/ 64 <825600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-883200000 { + opp-hz = /bits/ 64 <883200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-902400000 { + opp-hz = /bits/ 64 <902400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-940800000 { + opp-hz = /bits/ 64 <940800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-979200000 { + opp-hz = /bits/ 64 <979200000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1036800000 { + opp-hz = /bits/ 64 <1036800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1056000000 { + opp-hz = /bits/ 64 <1056000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1113600000 { + opp-hz = /bits/ 64 <1113600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1132800000 { + opp-hz = /bits/ 64 <1132800000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1190400000 { + opp-hz = /bits/ 64 <1190400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1209600000 { + opp-hz = /bits/ 64 <1209600000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1248000000 { + opp-hz = /bits/ 64 <1248000000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1286400000 { + opp-hz = /bits/ 64 <1286400000>; + opp-microvolt = <905000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1324800000 { + opp-hz = /bits/ 64 <1324800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1363200000 { + opp-hz = /bits/ 64 <1363200000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1401600000 { + opp-hz = /bits/ 64 <1401600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1440000000 { + opp-hz = /bits/ 64 <1440000000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1478400000 { + opp-hz = /bits/ 64 <1478400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1516800000 { + opp-hz = /bits/ 64 <1516800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1555200000 { + opp-hz = /bits/ 64 <1555200000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1593600000 { + opp-hz = /bits/ 64 <1593600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1632000000 { + opp-hz = /bits/ 64 <1632000000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1670400000 { + opp-hz = /bits/ 64 <1670400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1708800000 { + opp-hz = /bits/ 64 <1708800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1747200000 { + opp-hz = /bits/ 64 <1747200000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x70>; + clock-latency-ns = <200000>; + }; + opp-1785600000 { + opp-hz = /bits/ 64 <1785600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x7>; + clock-latency-ns = <200000>; + }; + opp-1804800000 { + opp-hz = /bits/ 64 <1804800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x6>; + clock-latency-ns = <200000>; + }; + opp-1824000000 { + opp-hz = /bits/ 64 <1824000000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x71>; + clock-latency-ns = <200000>; + }; + opp-1900800000 { + opp-hz = /bits/ 64 <1900800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x74>; + clock-latency-ns = <200000>; + }; + opp-1920000000 { + opp-hz = /bits/ 64 <1920000000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x1>; + clock-latency-ns = <200000>; + }; + opp-1977600000 { + opp-hz = /bits/ 64 <1977600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x30>; + clock-latency-ns = <200000>; + }; + opp-1996800000 { + opp-hz = /bits/ 64 <1996800000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x1>; + clock-latency-ns = <200000>; + }; + opp-2054400000 { + opp-hz = /bits/ 64 <2054400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x30>; + clock-latency-ns = <200000>; + }; + opp-2073600000 { + opp-hz = /bits/ 64 <2073600000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x1>; + clock-latency-ns = <200000>; + }; + opp-2150400000 { + opp-hz = /bits/ 64 <2150400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x31>; + clock-latency-ns = <200000>; + }; + opp-2246400000 { + opp-hz = /bits/ 64 <2246400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x10>; + clock-latency-ns = <200000>; + }; + opp-2342400000 { + opp-hz = /bits/ 64 <2342400000>; + opp-microvolt = <1140000 905000 1140000>; + opp-supported-hw = <0x10>; + clock-latency-ns = <200000>; + }; + }; + +.... + +reserved-memory { + #address-cells = <2>; + #size-cells = <2>; + ranges; +.... + smem_mem: smem-mem@86000000 { + reg = <0x0 0x86000000 0x0 0x200000>; + no-map; + }; +.... +}; + +smem { + compatible = "qcom,smem"; + memory-region = <&smem_mem>; + hwlocks = <&tcsr_mutex 3>; +}; + +soc { +.... + qfprom: qfprom@74000 { + compatible = "qcom,qfprom"; + reg = <0x00074000 0x8ff>; + #address-cells = <1>; + #size-cells = <1>; + .... + speedbin_efuse: speedbin@133 { + reg = <0x133 0x1>; + bits = <5 3>; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt index 4e4f30288c8bb726667e93c415c79477fc1e9253..c396c4c0af92db1faa8d0f7ee7fe44b6a7fac242 100644 --- a/Documentation/devicetree/bindings/opp/opp.txt +++ b/Documentation/devicetree/bindings/opp/opp.txt @@ -82,7 +82,10 @@ This defines voltage-current-frequency combinations along with other related properties. Required properties: -- opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. +- opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a + required property for all device nodes but devices like power domains. The + power domain nodes must have another (implementation dependent) property which + uniquely identifies the OPP nodes. Optional properties: - opp-microvolt: voltage in micro Volts. @@ -159,7 +162,7 @@ Optional properties: - status: Marks the node enabled/disabled. -- required-opp: This contains phandle to an OPP node in another device's OPP +- required-opps: This contains phandle to an OPP node in another device's OPP table. It may contain an array of phandles, where each phandle points to an OPP of a different device. It should not contain multiple phandles to the OPP nodes in the same OPP table. This specifies the minimum required OPP of the diff --git a/Documentation/devicetree/bindings/pci/designware-pcie.txt b/Documentation/devicetree/bindings/pci/designware-pcie.txt index 1da7ade3183c8f7e0b88a7315036bdd16fcaa09c..c124f9bc11f3a46579887ec06fd270af372d577b 100644 --- a/Documentation/devicetree/bindings/pci/designware-pcie.txt +++ b/Documentation/devicetree/bindings/pci/designware-pcie.txt @@ -1,7 +1,9 @@ * Synopsys DesignWare PCIe interface Required properties: -- compatible: should contain "snps,dw-pcie" to identify the core. +- compatible: + "snps,dw-pcie" for RC mode; + "snps,dw-pcie-ep" for EP mode; - reg: Should contain the configuration address space. - reg-names: Must be "config" for the PCIe configuration space. (The old way of getting the configuration address space from "ranges" @@ -41,11 +43,11 @@ EP mode: Example configuration: - pcie: pcie@dffff000 { + pcie: pcie@dfc00000 { compatible = "snps,dw-pcie"; - reg = <0xdffff000 0x1000>, /* Controller registers */ - <0xd0000000 0x2000>; /* PCI config space */ - reg-names = "ctrlreg", "config"; + reg = <0xdfc00000 0x0001000>, /* IP registers */ + <0xd0000000 0x0002000>; /* Configuration space */ + reg-names = "dbi", "config"; #address-cells = <3>; #size-cells = <2>; device_type = "pci"; @@ -54,5 +56,15 @@ Example configuration: interrupts = <25>, <24>; #interrupt-cells = <1>; num-lanes = <1>; - num-viewport = <3>; + }; +or + pcie: pcie@dfc00000 { + compatible = "snps,dw-pcie-ep"; + reg = <0xdfc00000 0x0001000>, /* IP registers 1 */ + <0xdfc01000 0x0001000>, /* IP registers 2 */ + <0xd0000000 0x2000000>; /* Configuration space */ + reg-names = "dbi", "dbi2", "addr_space"; + num-ib-windows = <6>; + num-ob-windows = <2>; + num-lanes = <1>; }; diff --git a/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt b/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt index 7bf9df047a1ee4992ee7f8b9f8e3f742b26bc8a0..0dcb87d6554fdaa2f8c36dc37ab6ea545f80bc86 100644 --- a/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt +++ b/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt @@ -3,7 +3,7 @@ HiSilicon Hip05 and Hip06 PCIe host bridge DT description HiSilicon PCIe host controller is based on the Synopsys DesignWare PCI core. It shares common functions with the PCIe DesignWare core driver and inherits common properties defined in -Documentation/devicetree/bindings/pci/designware-pci.txt. +Documentation/devicetree/bindings/pci/designware-pcie.txt. Additional properties are described here: diff --git a/Documentation/devicetree/bindings/pci/kirin-pcie.txt b/Documentation/devicetree/bindings/pci/kirin-pcie.txt index 6e217c63123db6c1021dba04d09ec419be05fd67..6bbe43818ad5d23d1e726cbc408ef08a86ea46fb 100644 --- a/Documentation/devicetree/bindings/pci/kirin-pcie.txt +++ b/Documentation/devicetree/bindings/pci/kirin-pcie.txt @@ -3,7 +3,7 @@ HiSilicon Kirin SoCs PCIe host DT description Kirin PCIe host controller is based on the Synopsys DesignWare PCI core. It shares common functions with the PCIe DesignWare core driver and inherits common properties defined in -Documentation/devicetree/bindings/pci/designware-pci.txt. +Documentation/devicetree/bindings/pci/designware-pcie.txt. Additional properties are described here: diff --git a/Documentation/devicetree/bindings/pci/mobiveil-pcie.txt b/Documentation/devicetree/bindings/pci/mobiveil-pcie.txt new file mode 100644 index 0000000000000000000000000000000000000000..65038aa642e5e39cf40b6968308d9aba3fbf57eb --- /dev/null +++ b/Documentation/devicetree/bindings/pci/mobiveil-pcie.txt @@ -0,0 +1,73 @@ +* Mobiveil AXI PCIe Root Port Bridge DT description + +Mobiveil's GPEX 4.0 is a PCIe Gen4 root port bridge IP. This configurable IP +has up to 8 outbound and inbound windows for the address translation. + +Required properties: +- #address-cells: Address representation for root ports, set to <3> +- #size-cells: Size representation for root ports, set to <2> +- #interrupt-cells: specifies the number of cells needed to encode an + interrupt source. The value must be 1. +- compatible: Should contain "mbvl,gpex40-pcie" +- reg: Should contain PCIe registers location and length + "config_axi_slave": PCIe controller registers + "csr_axi_slave" : Bridge config registers + "gpio_slave" : GPIO registers to control slot power + "apb_csr" : MSI registers + +- device_type: must be "pci" +- apio-wins : number of requested apio outbound windows + default 2 outbound windows are configured - + 1. Config window + 2. Memory window +- ppio-wins : number of requested ppio inbound windows + default 1 inbound memory window is configured. +- bus-range: PCI bus numbers covered +- interrupt-controller: identifies the node as an interrupt controller +- #interrupt-cells: specifies the number of cells needed to encode an + interrupt source. The value must be 1. +- interrupt-parent : phandle to the interrupt controller that + it is attached to, it should be set to gic to point to + ARM's Generic Interrupt Controller node in system DT. +- interrupts: The interrupt line of the PCIe controller + last cell of this field is set to 4 to + denote it as IRQ_TYPE_LEVEL_HIGH type interrupt. +- interrupt-map-mask, + interrupt-map: standard PCI properties to define the mapping of the + PCI interface to interrupt numbers. +- ranges: ranges for the PCI memory regions (I/O space region is not + supported by hardware) + Please refer to the standard PCI bus binding document for a more + detailed explanation + + +Example: +++++++++ + pcie0: pcie@a0000000 { + #address-cells = <3>; + #size-cells = <2>; + compatible = "mbvl,gpex40-pcie"; + reg = <0xa0000000 0x00001000>, + <0xb0000000 0x00010000>, + <0xff000000 0x00200000>, + <0xb0010000 0x00001000>; + reg-names = "config_axi_slave", + "csr_axi_slave", + "gpio_slave", + "apb_csr"; + device_type = "pci"; + apio-wins = <2>; + ppio-wins = <1>; + bus-range = <0x00000000 0x000000ff>; + interrupt-controller; + interrupt-parent = <&gic>; + #interrupt-cells = <1>; + interrupts = < 0 89 4 >; + interrupt-map-mask = <0 0 0 7>; + interrupt-map = <0 0 0 0 &pci_express 0>, + <0 0 0 1 &pci_express 1>, + <0 0 0 2 &pci_express 2>, + <0 0 0 3 &pci_express 3>; + ranges = < 0x83000000 0 0x00000000 0xa8000000 0 0x8000000>; + + }; diff --git a/Documentation/devicetree/bindings/pci/pci-armada8k.txt b/Documentation/devicetree/bindings/pci/pci-armada8k.txt index c1e4c3d10a747c008c2b914c89194228caed395e..9e3fc15e1af88d0853e07b78ebc0b5e62a9d064a 100644 --- a/Documentation/devicetree/bindings/pci/pci-armada8k.txt +++ b/Documentation/devicetree/bindings/pci/pci-armada8k.txt @@ -12,7 +12,10 @@ Required properties: - "ctrl" for the control register region - "config" for the config space region - interrupts: Interrupt specifier for the PCIe controler -- clocks: reference to the PCIe controller clock +- clocks: reference to the PCIe controller clocks +- clock-names: mandatory if there is a second clock, in this case the + name must be "core" for the first clock and "reg" for the second + one Example: diff --git a/Documentation/devicetree/bindings/pci/pci-keystone.txt b/Documentation/devicetree/bindings/pci/pci-keystone.txt index 7e05487544edffae025e1071406fe711cf220773..3d4a209b0fd057e173fa24c7e20bc75c234445d5 100644 --- a/Documentation/devicetree/bindings/pci/pci-keystone.txt +++ b/Documentation/devicetree/bindings/pci/pci-keystone.txt @@ -3,9 +3,9 @@ TI Keystone PCIe interface Keystone PCI host Controller is based on the Synopsys DesignWare PCI hardware version 3.65. It shares common functions with the PCIe DesignWare core driver and inherits common properties defined in -Documentation/devicetree/bindings/pci/designware-pci.txt +Documentation/devicetree/bindings/pci/designware-pcie.txt -Please refer to Documentation/devicetree/bindings/pci/designware-pci.txt +Please refer to Documentation/devicetree/bindings/pci/designware-pcie.txt for the details of DesignWare DT bindings. Additional properties are described here as well as properties that are not applicable. diff --git a/Documentation/devicetree/bindings/pci/rcar-pci.txt b/Documentation/devicetree/bindings/pci/rcar-pci.txt index 1fb614e615da3c393b5028cb884fed97e0aa7ebf..a5f7fc62d10e98221eed5314771c8cc09b8622ee 100644 --- a/Documentation/devicetree/bindings/pci/rcar-pci.txt +++ b/Documentation/devicetree/bindings/pci/rcar-pci.txt @@ -8,6 +8,7 @@ compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC; "renesas,pcie-r8a7793" for the R8A7793 SoC; "renesas,pcie-r8a7795" for the R8A7795 SoC; "renesas,pcie-r8a7796" for the R8A7796 SoC; + "renesas,pcie-r8a77980" for the R8A77980 SoC; "renesas,pcie-rcar-gen2" for a generic R-Car Gen2 or RZ/G1 compatible device. "renesas,pcie-rcar-gen3" for a generic R-Car Gen3 compatible device. @@ -32,6 +33,11 @@ compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC; and PCIe bus clocks. - clock-names: from common clock binding: should be "pcie" and "pcie_bus". +Optional properties: +- phys: from common PHY binding: PHY phandle and specifier (only make sense + for R-Car gen3 SoCs where the PCIe PHYs have their own register blocks). +- phy-names: from common PHY binding: should be "pcie". + Example: SoC-specific DT Entry: diff --git a/Documentation/devicetree/bindings/pci/rockchip-pcie-ep.txt b/Documentation/devicetree/bindings/pci/rockchip-pcie-ep.txt new file mode 100644 index 0000000000000000000000000000000000000000..778467307a93228553d79fc8f57d755f7f284b25 --- /dev/null +++ b/Documentation/devicetree/bindings/pci/rockchip-pcie-ep.txt @@ -0,0 +1,62 @@ +* Rockchip AXI PCIe Endpoint Controller DT description + +Required properties: +- compatible: Should contain "rockchip,rk3399-pcie-ep" +- reg: Two register ranges as listed in the reg-names property +- reg-names: Must include the following names + - "apb-base" + - "mem-base" +- clocks: Must contain an entry for each entry in clock-names. + See ../clocks/clock-bindings.txt for details. +- clock-names: Must include the following entries: + - "aclk" + - "aclk-perf" + - "hclk" + - "pm" +- resets: Must contain seven entries for each entry in reset-names. + See ../reset/reset.txt for details. +- reset-names: Must include the following names + - "core" + - "mgmt" + - "mgmt-sticky" + - "pipe" + - "pm" + - "aclk" + - "pclk" +- pinctrl-names : The pin control state names +- pinctrl-0: The "default" pinctrl state +- phys: Must contain an phandle to a PHY for each entry in phy-names. +- phy-names: Must include 4 entries for all 4 lanes even if some of + them won't be used for your cases. Entries are of the form "pcie-phy-N": + where N ranges from 0 to 3. + (see example below and you MUST also refer to ../phy/rockchip-pcie-phy.txt + for changing the #phy-cells of phy node to support it) +- rockchip,max-outbound-regions: Maximum number of outbound regions + +Optional Property: +- num-lanes: number of lanes to use +- max-functions: Maximum number of functions that can be configured (default 1). + +pcie0-ep: pcie@f8000000 { + compatible = "rockchip,rk3399-pcie-ep"; + #address-cells = <3>; + #size-cells = <2>; + rockchip,max-outbound-regions = <16>; + clocks = <&cru ACLK_PCIE>, <&cru ACLK_PERF_PCIE>, + <&cru PCLK_PCIE>, <&cru SCLK_PCIE_PM>; + clock-names = "aclk", "aclk-perf", + "hclk", "pm"; + max-functions = /bits/ 8 <8>; + num-lanes = <4>; + reg = <0x0 0xfd000000 0x0 0x1000000>, <0x0 0x80000000 0x0 0x20000>; + reg-names = "apb-base", "mem-base"; + resets = <&cru SRST_PCIE_CORE>, <&cru SRST_PCIE_MGMT>, + <&cru SRST_PCIE_MGMT_STICKY>, <&cru SRST_PCIE_PIPE> , + <&cru SRST_PCIE_PM>, <&cru SRST_P_PCIE>, <&cru SRST_A_PCIE>; + reset-names = "core", "mgmt", "mgmt-sticky", "pipe", + "pm", "pclk", "aclk"; + phys = <&pcie_phy 0>, <&pcie_phy 1>, <&pcie_phy 2>, <&pcie_phy 3>; + phy-names = "pcie-phy-0", "pcie-phy-1", "pcie-phy-2", "pcie-phy-3"; + pinctrl-names = "default"; + pinctrl-0 = <&pcie_clkreq>; +}; diff --git a/Documentation/devicetree/bindings/pci/rockchip-pcie.txt b/Documentation/devicetree/bindings/pci/rockchip-pcie-host.txt similarity index 100% rename from Documentation/devicetree/bindings/pci/rockchip-pcie.txt rename to Documentation/devicetree/bindings/pci/rockchip-pcie-host.txt diff --git a/Documentation/devicetree/bindings/pci/xgene-pci.txt b/Documentation/devicetree/bindings/pci/xgene-pci.txt index 6fd2decfa66c4b30d0de6d77f57ceef77eadd28a..92490330dc1c7e7444dfe1176944cbb06b36a97b 100644 --- a/Documentation/devicetree/bindings/pci/xgene-pci.txt +++ b/Documentation/devicetree/bindings/pci/xgene-pci.txt @@ -25,8 +25,6 @@ Optional properties: Example: -SoC-specific DT Entry: - pcie0: pcie@1f2b0000 { status = "disabled"; device_type = "pci"; @@ -50,8 +48,3 @@ SoC-specific DT Entry: clocks = <&pcie0clk 0>; }; - -Board-specific DT Entry: - &pcie0 { - status = "ok"; - }; diff --git a/Documentation/devicetree/bindings/phy/phy-mtk-xsphy.txt b/Documentation/devicetree/bindings/phy/phy-mtk-xsphy.txt new file mode 100644 index 0000000000000000000000000000000000000000..e7caefa0b9c2687475e1458098c3ebb6dbbf66bf --- /dev/null +++ b/Documentation/devicetree/bindings/phy/phy-mtk-xsphy.txt @@ -0,0 +1,109 @@ +MediaTek XS-PHY binding +-------------------------- + +The XS-PHY controller supports physical layer functionality for USB3.1 +GEN2 controller on MediaTek SoCs. + +Required properties (controller (parent) node): + - compatible : should be "mediatek,-xsphy", "mediatek,xsphy", + soc-model is the name of SoC, such as mt3611 etc; + when using "mediatek,xsphy" compatible string, you need SoC specific + ones in addition, one of: + - "mediatek,mt3611-xsphy" + + - #address-cells, #size-cells : should use the same values as the root node + - ranges: must be present + +Optional properties (controller (parent) node): + - reg : offset and length of register shared by multiple U3 ports, + exclude port's private register, if only U2 ports provided, + shouldn't use the property. + - mediatek,src-ref-clk-mhz : u32, frequency of reference clock for slew rate + calibrate + - mediatek,src-coef : u32, coefficient for slew rate calibrate, depends on + SoC process + +Required nodes : a sub-node is required for each port the controller + provides. Address range information including the usual + 'reg' property is used inside these nodes to describe + the controller's topology. + +Required properties (port (child) node): +- reg : address and length of the register set for the port. +- clocks : a list of phandle + clock-specifier pairs, one for each + entry in clock-names +- clock-names : must contain + "ref": 48M reference clock for HighSpeed analog phy; and 26M + reference clock for SuperSpeedPlus analog phy, sometimes is + 24M, 25M or 27M, depended on platform. +- #phy-cells : should be 1 + cell after port phandle is phy type from: + - PHY_TYPE_USB2 + - PHY_TYPE_USB3 + +The following optional properties are only for debug or HQA test +Optional properties (PHY_TYPE_USB2 port (child) node): +- mediatek,eye-src : u32, the value of slew rate calibrate +- mediatek,eye-vrt : u32, the selection of VRT reference voltage +- mediatek,eye-term : u32, the selection of HS_TX TERM reference voltage +- mediatek,efuse-intr : u32, the selection of Internal Resistor + +Optional properties (PHY_TYPE_USB3 port (child) node): +- mediatek,efuse-intr : u32, the selection of Internal Resistor +- mediatek,efuse-tx-imp : u32, the selection of TX Impedance +- mediatek,efuse-rx-imp : u32, the selection of RX Impedance + +Banks layout of xsphy +------------------------------------------------------------- +port offset bank +u2 port0 0x0000 MISC + 0x0100 FMREG + 0x0300 U2PHY_COM +u2 port1 0x1000 MISC + 0x1100 FMREG + 0x1300 U2PHY_COM +u2 port2 0x2000 MISC + ... +u31 common 0x3000 DIG_GLB + 0x3100 PHYA_GLB +u31 port0 0x3400 DIG_LN_TOP + 0x3500 DIG_LN_TX0 + 0x3600 DIG_LN_RX0 + 0x3700 DIG_LN_DAIF + 0x3800 PHYA_LN +u31 port1 0x3a00 DIG_LN_TOP + 0x3b00 DIG_LN_TX0 + 0x3c00 DIG_LN_RX0 + 0x3d00 DIG_LN_DAIF + 0x3e00 PHYA_LN + ... + +DIG_GLB & PHYA_GLB are shared by U31 ports. + +Example: + +u3phy: usb-phy@11c40000 { + compatible = "mediatek,mt3611-xsphy", "mediatek,xsphy"; + reg = <0 0x11c43000 0 0x0200>; + mediatek,src-ref-clk-mhz = <26>; + mediatek,src-coef = <17>; + #address-cells = <2>; + #size-cells = <2>; + ranges; + + u2port0: usb-phy@11c40000 { + reg = <0 0x11c40000 0 0x0400>; + clocks = <&clk48m>; + clock-names = "ref"; + mediatek,eye-src = <4>; + #phy-cells = <1>; + }; + + u3port0: usb-phy@11c43000 { + reg = <0 0x11c43400 0 0x0500>; + clocks = <&clk26m>; + clock-names = "ref"; + mediatek,efuse-intr = <28>; + #phy-cells = <1>; + }; +}; diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt index dcf1b8f691d508b4186ffb1cd7de31d78a6ff8fb..266a1bb8bb6e4f1e86bc07ded0265cfa6002239a 100644 --- a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt +++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt @@ -9,7 +9,8 @@ Required properties: "qcom,ipq8074-qmp-pcie-phy" for PCIe phy on IPQ8074 "qcom,msm8996-qmp-pcie-phy" for 14nm PCIe phy on msm8996, "qcom,msm8996-qmp-usb3-phy" for 14nm USB3 phy on msm8996, - "qcom,qmp-v3-usb3-phy" for USB3 QMP V3 phy. + "qcom,sdm845-qmp-usb3-phy" for USB3 QMP V3 phy on sdm845, + "qcom,sdm845-qmp-usb3-uni-phy" for USB3 QMP V3 UNI phy on sdm845. - reg: offset and length of register set for PHY's common serdes block. diff --git a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt index 42c97426836eaf50fbccb2271d1d719c5918fa06..03025d97998b48276ceec27adaa230debd9febfa 100644 --- a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt +++ b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt @@ -6,7 +6,7 @@ QUSB2 controller supports LS/FS/HS usb connectivity on Qualcomm chipsets. Required properties: - compatible: compatible list, contains "qcom,msm8996-qusb2-phy" for 14nm PHY on msm8996, - "qcom,qusb2-v2-phy" for QUSB2 V2 PHY. + "qcom,sdm845-qusb2-phy" for 10nm PHY on sdm845. - reg: offset and length of the PHY register set. - #phy-cells: must be 0. @@ -27,6 +27,27 @@ Optional properties: tuning parameter value for qusb2 phy. - qcom,tcsr-syscon: Phandle to TCSR syscon register region. + - qcom,imp-res-offset-value: It is a 6 bit value that specifies offset to be + added to PHY refgen RESCODE via IMP_CTRL1 register. It is a PHY + tuning parameter that may vary for different boards of same SOC. + This property is applicable to only QUSB2 v2 PHY (sdm845). + - qcom,hstx-trim-value: It is a 4 bit value that specifies tuning for HSTX + output current. + Possible range is - 15mA to 24mA (stepsize of 600 uA). + See dt-bindings/phy/phy-qcom-qusb2.h for applicable values. + This property is applicable to only QUSB2 v2 PHY (sdm845). + Default value is 22.2mA for sdm845. + - qcom,preemphasis-level: It is a 2 bit value that specifies pre-emphasis level. + Possible range is 0 to 15% (stepsize of 5%). + See dt-bindings/phy/phy-qcom-qusb2.h for applicable values. + This property is applicable to only QUSB2 v2 PHY (sdm845). + Default value is 10% for sdm845. +- qcom,preemphasis-width: It is a 1 bit value that specifies how long the HSTX + pre-emphasis (specified using qcom,preemphasis-level) must be in + effect. Duration could be half-bit of full-bit. + See dt-bindings/phy/phy-qcom-qusb2.h for applicable values. + This property is applicable to only QUSB2 v2 PHY (sdm845). + Default value is full-bit width for sdm845. Example: hsusb_phy: phy@7411000 { diff --git a/Documentation/devicetree/bindings/pinctrl/actions,s900-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/actions,s900-pinctrl.txt index fb87c7d74f2e56eea0192c183287bec56a97e70c..8fb5a53775e88afea215e3318ec352278e53624e 100644 --- a/Documentation/devicetree/bindings/pinctrl/actions,s900-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/actions,s900-pinctrl.txt @@ -8,6 +8,17 @@ Required Properties: - reg: Should contain the register base address and size of the pin controller. - clocks: phandle of the clock feeding the pin controller +- gpio-controller: Marks the device node as a GPIO controller. +- gpio-ranges: Specifies the mapping between gpio controller and + pin-controller pins. +- #gpio-cells: Should be two. The first cell is the gpio pin number + and the second cell is used for optional parameters. +- interrupt-controller: Marks the device node as an interrupt controller. +- #interrupt-cells: Specifies the number of cells needed to encode an + interrupt. Shall be set to 2. The first cell + defines the interrupt number, the second encodes + the trigger flags described in + bindings/interrupt-controller/interrupts.txt Please refer to pinctrl-bindings.txt in this directory for details of the common pinctrl bindings used by client devices, including the meaning of the @@ -164,6 +175,11 @@ Example: compatible = "actions,s900-pinctrl"; reg = <0x0 0xe01b0000 0x0 0x1000>; clocks = <&cmu CLK_GPIO>; + gpio-controller; + gpio-ranges = <&pinctrl 0 0 146>; + #gpio-cells = <2>; + interrupt-controller; + #interrupt-cells = <2>; uart2-default: uart2-default { pinmux { diff --git a/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt index 64bc5c2a76da613609b3c4db8f3bc00bf9a38107..258a4648ab813304b9008daf7572c32ee7b973ca 100644 --- a/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt @@ -28,6 +28,7 @@ Required properties: "allwinner,sun50i-a64-r-pinctrl" "allwinner,sun50i-h5-pinctrl" "allwinner,sun50i-h6-pinctrl" + "allwinner,sun50i-h6-r-pinctrl" "nextthing,gr8-pinctrl" - reg: Should contain the register physical address and length for the diff --git a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt index 2569866c692f4a8c90b2641953deb41ef370d3db..3fac0a061bcc7c5b1cc6dbfa76b4b006e6876f48 100644 --- a/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt +++ b/Documentation/devicetree/bindings/pinctrl/brcm,bcm2835-gpio.txt @@ -36,6 +36,24 @@ listed. In other words, a subnode that lists only a mux function implies no information about any pull configuration. Similarly, a subnode that lists only a pul parameter implies no information about the mux function. +The BCM2835 pin configuration and multiplexing supports the generic bindings. +For details on each properties, you can refer to ./pinctrl-bindings.txt. + +Required sub-node properties: + - pins + - function + +Optional sub-node properties: + - bias-disable + - bias-pull-up + - bias-pull-down + - output-high + - output-low + +Legacy pin configuration and multiplexing binding: +*** (Its use is deprecated, use generic multiplexing and configuration +bindings instead) + Required subnode-properties: - brcm,pins: An array of cells. Each cell contains the ID of a pin. Valid IDs are the integer GPIO IDs; 0==GPIO0, 1==GPIO1, ... 53==GPIO53. diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-max77620.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-max77620.txt index ad4fce3552bbec734726f08a694294512a3f7038..511fc234558bf1743c76c2f2d03c4c40b2a22bfd 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-max77620.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-max77620.txt @@ -11,9 +11,9 @@ Optional Pinmux properties: -------------------------- Following properties are required if default setting of pins are required at boot. -- pinctrl-names: A pinctrl state named per . +- pinctrl-names: A pinctrl state named per . - pinctrl[0...n]: Properties to contain the phandle for pinctrl states per - . + . The pin configurations are defined as child of the pinctrl states node. Each sub-node have following properties: diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-mcp23s08.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-mcp23s08.txt index a5a8322a31bda245497776187a4d393e091c7349..625a22e2f2115050477e59c076203be38346ae91 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-mcp23s08.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-mcp23s08.txt @@ -18,7 +18,9 @@ Required properties: removed. - #gpio-cells : Should be two. - first cell is the pin number - - second cell is used to specify flags. Flags are currently unused. + - second cell is used to specify flags as described in + 'Documentation/devicetree/bindings/gpio/gpio.txt'. Allowed values defined by + 'include/dt-bindings/gpio/gpio.h' (e.g. GPIO_ACTIVE_LOW). - gpio-controller : Marks the device node as a GPIO controller. - reg : For an address on its bus. I2C uses this a the I2C address of the chip. SPI uses this to specify the chipselect line which the chip is @@ -99,9 +101,9 @@ Optional Pinmux properties: -------------------------- Following properties are required if default setting of pins are required at boot. -- pinctrl-names: A pinctrl state named per . +- pinctrl-names: A pinctrl state named per . - pinctrl[0...n]: Properties to contain the phandle for pinctrl states per - . + . The pin configurations are defined as child of the pinctrl states node. Each sub-node have following properties: diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-mt7622.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-mt7622.txt index f18ed99f6e1455d0082327c0dde5797f589bb6a1..def8fcad894147ed4e7d4ea14b974e902c7baab7 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-mt7622.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-mt7622.txt @@ -9,6 +9,16 @@ Required properties for the root node: - #gpio-cells: Should be two. The first cell is the pin number and the second is the GPIO flags. +Optional properties: +- interrupt-controller : Marks the device node as an interrupt controller + +If the property interrupt-controller is defined, following property is required +- reg-names: A string describing the "reg" entries. Must contain "eint". +- interrupts : The interrupt output from the controller. +- #interrupt-cells: Should be two. +- interrupt-parent: Phandle of the interrupt parent to which the external + GPIO interrupts are forwarded to. + Please refer to pinctrl-bindings.txt in this directory for details of the common pinctrl bindings used by client devices, including the meaning of the phrase "pin configuration node". diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-rk805.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-rk805.txt index eee3dc26093437e22b2271dda50c627eafcfe39f..cbcbd31e3ce850bf9d3cd501027350991bfefaf4 100644 --- a/Documentation/devicetree/bindings/pinctrl/pinctrl-rk805.txt +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-rk805.txt @@ -10,9 +10,9 @@ Optional Pinmux properties: -------------------------- Following properties are required if default setting of pins are required at boot. -- pinctrl-names: A pinctrl state named per . +- pinctrl-names: A pinctrl state named per . - pinctrl[0...n]: Properties to contain the phandle for pinctrl states per - . + . The pin configurations are defined as child of the pinctrl states node. Each sub-node have following properties: diff --git a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt index 892d8fd7b70036a8beef8eab75a08bf8d555384a..abd8fbcf1e62d480562a0fb9d1471020c0c29d75 100644 --- a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt @@ -15,6 +15,7 @@ Required Properties: - "renesas,pfc-r8a7740": for R8A7740 (R-Mobile A1) compatible pin-controller. - "renesas,pfc-r8a7743": for R8A7743 (RZ/G1M) compatible pin-controller. - "renesas,pfc-r8a7745": for R8A7745 (RZ/G1E) compatible pin-controller. + - "renesas,pfc-r8a77470": for R8A77470 (RZ/G1C) compatible pin-controller. - "renesas,pfc-r8a7778": for R8A7778 (R-Car M1) compatible pin-controller. - "renesas,pfc-r8a7779": for R8A7779 (R-Car H1) compatible pin-controller. - "renesas,pfc-r8a7790": for R8A7790 (R-Car H2) compatible pin-controller. @@ -27,6 +28,7 @@ Required Properties: - "renesas,pfc-r8a77965": for R8A77965 (R-Car M3-N) compatible pin-controller. - "renesas,pfc-r8a77970": for R8A77970 (R-Car V3M) compatible pin-controller. - "renesas,pfc-r8a77980": for R8A77980 (R-Car V3H) compatible pin-controller. + - "renesas,pfc-r8a77990": for R8A77990 (R-Car E3) compatible pin-controller. - "renesas,pfc-r8a77995": for R8A77995 (R-Car D3) compatible pin-controller. - "renesas,pfc-sh73a0": for SH73A0 (SH-Mobile AG5) compatible pin-controller. diff --git a/Documentation/devicetree/bindings/pinctrl/rockchip,pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/rockchip,pinctrl.txt index a01a3b8a23632a8707ff9e528c6c010500d8da9a..0919db294c174e66710280d8d6688f78d436e3cd 100644 --- a/Documentation/devicetree/bindings/pinctrl/rockchip,pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/rockchip,pinctrl.txt @@ -20,6 +20,7 @@ defined as gpio sub-nodes of the pinmux controller. Required properties for iomux controller: - compatible: should be + "rockchip,px30-pinctrl": for Rockchip PX30 "rockchip,rv1108-pinctrl": for Rockchip RV1108 "rockchip,rk2928-pinctrl": for Rockchip RK2928 "rockchip,rk3066a-pinctrl": for Rockchip RK3066a diff --git a/Documentation/devicetree/bindings/power/fsl,imx-gpc.txt b/Documentation/devicetree/bindings/power/fsl,imx-gpc.txt index b31d6bbeee1644e7a553cb69fab1da5e9a225ec6..726ec2875223d8038b2a8fa7300d7ccfd8db2884 100644 --- a/Documentation/devicetree/bindings/power/fsl,imx-gpc.txt +++ b/Documentation/devicetree/bindings/power/fsl,imx-gpc.txt @@ -14,7 +14,7 @@ Required properties: datasheet - interrupts: Should contain one interrupt specifier for the GPC interrupt - clocks: Must contain an entry for each entry in clock-names. - See Documentation/devicetree/bindings/clocks/clock-bindings.txt for details. + See Documentation/devicetree/bindings/clock/clock-bindings.txt for details. - clock-names: Must include the following entries: - ipg diff --git a/Documentation/devicetree/bindings/power/pd-samsung.txt b/Documentation/devicetree/bindings/power/pd-samsung.txt index 549f7dee9b9de0b33f851ab1bb34008ab9d1a51a..92ef355e8f6433c58c14f4bc3d05f39fbf7ea858 100644 --- a/Documentation/devicetree/bindings/power/pd-samsung.txt +++ b/Documentation/devicetree/bindings/power/pd-samsung.txt @@ -15,23 +15,13 @@ Required Properties: Optional Properties: - label: Human readable string with domain name. Will be visible in userspace to let user to distinguish between multiple domains in SoC. -- clocks: List of clock handles. The parent clocks of the input clocks to the - devices in this power domain are set to oscclk before power gating - and restored back after powering on a domain. This is required for - all domains which are powered on and off and not required for unused - domains. -- clock-names: The following clocks can be specified: - - oscclk: Oscillator clock. - - clkN: Input clocks to the devices in this power domain. These clocks - will be reparented to oscclk before switching power domain off. - Their original parent will be brought back after turning on - the domain. Maximum of 4 clocks (N = 0 to 3) are supported. - - asbN: Clocks required by asynchronous bridges (ASB) present in - the power domain. These clock should be enabled during power - domain on/off operations. - power-domains: phandle pointing to the parent power domain, for more details see Documentation/devicetree/bindings/power/power_domain.txt +Deprecated Properties: +- clocks +- clock-names + Node of a device using power domains must have a power-domains property defined with a phandle to respective power domain. @@ -47,8 +37,6 @@ Example: mfc_pd: power-domain@10044060 { compatible = "samsung,exynos4210-pd"; reg = <0x10044060 0x20>; - clocks = <&clock CLK_FIN_PLL>, <&clock CLK_MOUT_USER_ACLK333>; - clock-names = "oscclk", "clk0"; #power-domain-cells = <0>; label = "MFC"; }; diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt index f3355313c020f7b644b1283781ae4c33a4a0d192..7dec508987c75c70ac194876df9d4f8019586aab 100644 --- a/Documentation/devicetree/bindings/power/power_domain.txt +++ b/Documentation/devicetree/bindings/power/power_domain.txt @@ -111,8 +111,8 @@ Example 3: ==PM domain consumers== Required properties: - - power-domains : A phandle and PM domain specifier as defined by bindings of - the power controller specified by phandle. + - power-domains : A list of PM domain specifiers, as defined by bindings of + the power controller that is the PM domain provider. Example: @@ -122,12 +122,21 @@ Example: power-domains = <&power 0>; }; -The node above defines a typical PM domain consumer device, which is located -inside a PM domain with index 0 of a power controller represented by a node -with the label "power". + leaky-device@12351000 { + compatible = "foo,i-leak-current"; + reg = <0x12351000 0x1000>; + power-domains = <&power 0>, <&power 1> ; + }; + +The first example above defines a typical PM domain consumer device, which is +located inside a PM domain with index 0 of a power controller represented by a +node with the label "power". +In the second example the consumer device are partitioned across two PM domains, +the first with index 0 and the second with index 1, of a power controller that +is represented by a node with the label "power". Optional properties: -- required-opp: This contains phandle to an OPP node in another device's OPP +- required-opps: This contains phandle to an OPP node in another device's OPP table. It may contain an array of phandles, where each phandle points to an OPP of a different device. It should not contain multiple phandles to the OPP nodes in the same OPP table. This specifies the minimum required OPP of the @@ -175,14 +184,14 @@ Example: compatible = "foo,i-leak-current"; reg = <0x12350000 0x1000>; power-domains = <&power 0>; - required-opp = <&domain0_opp_0>; + required-opps = <&domain0_opp_0>; }; leaky-device1@12350000 { compatible = "foo,i-leak-current"; reg = <0x12350000 0x1000>; power-domains = <&power 1>; - required-opp = <&domain1_opp_1>; + required-opps = <&domain1_opp_1>; }; [1]. Documentation/devicetree/bindings/power/domain-idle-state.txt diff --git a/Documentation/devicetree/bindings/power/renesas,rcar-sysc.txt b/Documentation/devicetree/bindings/power/renesas,rcar-sysc.txt index ab399e559257a106448f583dbd7275137d4c5bdb..180ae65be753ae43ecc39a294a3a7163bdb517d5 100644 --- a/Documentation/devicetree/bindings/power/renesas,rcar-sysc.txt +++ b/Documentation/devicetree/bindings/power/renesas,rcar-sysc.txt @@ -9,6 +9,7 @@ Required properties: - compatible: Must contain exactly one of the following: - "renesas,r8a7743-sysc" (RZ/G1M) - "renesas,r8a7745-sysc" (RZ/G1E) + - "renesas,r8a77470-sysc" (RZ/G1C) - "renesas,r8a7779-sysc" (R-Car H1) - "renesas,r8a7790-sysc" (R-Car H2) - "renesas,r8a7791-sysc" (R-Car M2-W) @@ -20,6 +21,7 @@ Required properties: - "renesas,r8a77965-sysc" (R-Car M3-N) - "renesas,r8a77970-sysc" (R-Car V3M) - "renesas,r8a77980-sysc" (R-Car V3H) + - "renesas,r8a77990-sysc" (R-Car E3) - "renesas,r8a77995-sysc" (R-Car D3) - reg: Address start and address range for the device. - #power-domain-cells: Must be 1. diff --git a/Documentation/devicetree/bindings/power/rockchip-io-domain.txt b/Documentation/devicetree/bindings/power/rockchip-io-domain.txt index 4a4766e9c254e1dae631de52cdf19b7b04454142..e66fd4eab71cba5d6af32870156c5cb54aa091cb 100644 --- a/Documentation/devicetree/bindings/power/rockchip-io-domain.txt +++ b/Documentation/devicetree/bindings/power/rockchip-io-domain.txt @@ -31,6 +31,8 @@ SoC is on the same page. Required properties: - compatible: should be one of: + - "rockchip,px30-io-voltage-domain" for px30 + - "rockchip,px30-pmu-io-voltage-domain" for px30 pmu-domains - "rockchip,rk3188-io-voltage-domain" for rk3188 - "rockchip,rk3228-io-voltage-domain" for rk3228 - "rockchip,rk3288-io-voltage-domain" for rk3288 @@ -51,6 +53,19 @@ a phandle the relevant regulator. All specified supplies must be able to report their voltage. The IO Voltage Domain for any non-specified supplies will be not be touched. +Possible supplies for PX30: +- vccio6-supply: The supply connected to VCCIO6. +- vccio1-supply: The supply connected to VCCIO1. +- vccio2-supply: The supply connected to VCCIO2. +- vccio3-supply: The supply connected to VCCIO3. +- vccio4-supply: The supply connected to VCCIO4. +- vccio5-supply: The supply connected to VCCIO5. +- vccio-oscgpi-supply: The supply connected to VCCIO_OSCGPI. + +Possible supplies for PX30 pmu-domains: +- pmuio1-supply: The supply connected to PMUIO1. +- pmuio2-supply: The supply connected to PMUIO2. + Possible supplies for rk3188: - ap0-supply: The supply connected to AP0_VCC. - ap1-supply: The supply connected to AP1_VCC. diff --git a/Documentation/devicetree/bindings/power/supply/ab8500/btemp.txt b/Documentation/devicetree/bindings/power/supply/ab8500/btemp.txt index 0ba1bcc7f33aad00408c287f8d1fd95911443c41..f181e46d8e077c333fe3e6ea71d976bee0ec61a0 100644 --- a/Documentation/devicetree/bindings/power/supply/ab8500/btemp.txt +++ b/Documentation/devicetree/bindings/power/supply/ab8500/btemp.txt @@ -13,4 +13,4 @@ Required Properties: }; For information on battery specific node, Ref: -Documentation/devicetree/bindings/power_supply/ab8500/fg.txt +Documentation/devicetree/bindings/power/supply/ab8500/fg.txt diff --git a/Documentation/devicetree/bindings/power/supply/ab8500/chargalg.txt b/Documentation/devicetree/bindings/power/supply/ab8500/chargalg.txt index ef5328371122c58a3ecb7ed359873de4c8a6203a..56636f927203e8398a6430d61b47da4b65cfed49 100644 --- a/Documentation/devicetree/bindings/power/supply/ab8500/chargalg.txt +++ b/Documentation/devicetree/bindings/power/supply/ab8500/chargalg.txt @@ -13,4 +13,4 @@ ab8500_chargalg { }; For information on battery specific node, Ref: -Documentation/devicetree/bindings/power_supply/ab8500/fg.txt +Documentation/devicetree/bindings/power/supply/ab8500/fg.txt diff --git a/Documentation/devicetree/bindings/power/supply/ab8500/charger.txt b/Documentation/devicetree/bindings/power/supply/ab8500/charger.txt index 6bdbb08ea9e0f78a82f52c7aac9a5f7dee180c9b..24ada03e07b4dda824fc479a61ce35ea18834dea 100644 --- a/Documentation/devicetree/bindings/power/supply/ab8500/charger.txt +++ b/Documentation/devicetree/bindings/power/supply/ab8500/charger.txt @@ -22,4 +22,4 @@ Required Properties: }; For information on battery specific node, Ref: -Documentation/devicetree/bindings/power_supply/ab8500/fg.txt +Documentation/devicetree/bindings/power/supply/ab8500/fg.txt diff --git a/Documentation/devicetree/bindings/power/supply/bq27xxx.txt b/Documentation/devicetree/bindings/power/supply/bq27xxx.txt index 615c1cb6889f89be4e20793fa8d709a5f5b490f0..37994fdb18cae7afec76cded46b009e11de8dfaf 100644 --- a/Documentation/devicetree/bindings/power/supply/bq27xxx.txt +++ b/Documentation/devicetree/bindings/power/supply/bq27xxx.txt @@ -25,6 +25,7 @@ Required properties: * "ti,bq27545" - BQ27545 * "ti,bq27421" - BQ27421 * "ti,bq27425" - BQ27425 + * "ti,bq27426" - BQ27426 * "ti,bq27441" - BQ27441 * "ti,bq27621" - BQ27621 - reg: integer, I2C address of the fuel gauge. diff --git a/Documentation/devicetree/bindings/power/wakeup-source.txt b/Documentation/devicetree/bindings/power/wakeup-source.txt index 5d254ab13ebf384034b7e8629365a7b0070c2af3..cfd74659fbed03e00a6de0e7f0150b32fbf3713b 100644 --- a/Documentation/devicetree/bindings/power/wakeup-source.txt +++ b/Documentation/devicetree/bindings/power/wakeup-source.txt @@ -22,7 +22,7 @@ List of legacy properties and respective binding document 3. "has-tpo" Documentation/devicetree/bindings/rtc/rtc-opal.txt 4. "linux,wakeup" Documentation/devicetree/bindings/input/gpio-matrix-keypad.txt Documentation/devicetree/bindings/mfd/tc3589x.txt - Documentation/devicetree/bindings/input/ads7846.txt + Documentation/devicetree/bindings/input/touchscreen/ads7846.txt 5. "linux,keypad-wakeup" Documentation/devicetree/bindings/input/qcom,pm8xxx-keypad.txt 6. "linux,input-wakeup" Documentation/devicetree/bindings/input/samsung-keypad.txt 7. "nvidia,wakeup-source" Documentation/devicetree/bindings/input/nvidia,tegra20-kbc.txt diff --git a/Documentation/devicetree/bindings/pps/pps-gpio.txt b/Documentation/devicetree/bindings/pps/pps-gpio.txt index 0de23b79365729d376a97032d76231ad70aa9b38..3683874832ae10a03c9fe842299d315e4d305d7d 100644 --- a/Documentation/devicetree/bindings/pps/pps-gpio.txt +++ b/Documentation/devicetree/bindings/pps/pps-gpio.txt @@ -20,5 +20,4 @@ Example: assert-falling-edge; compatible = "pps-gpio"; - status = "okay"; }; diff --git a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt new file mode 100644 index 0000000000000000000000000000000000000000..0f569d8e73a3cb09fa1fb9d9dcf27375e3aad6ed --- /dev/null +++ b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt @@ -0,0 +1,69 @@ +* Freescale QorIQ 1588 timer based PTP clock + +General Properties: + + - compatible Should be "fsl,etsec-ptp" + - reg Offset and length of the register set for the device + - interrupts There should be at least two interrupts. Some devices + have as many as four PTP related interrupts. + +Clock Properties: + + - fsl,cksel Timer reference clock source. + - fsl,tclk-period Timer reference clock period in nanoseconds. + - fsl,tmr-prsc Prescaler, divides the output clock. + - fsl,tmr-add Frequency compensation value. + - fsl,tmr-fiper1 Fixed interval period pulse generator. + - fsl,tmr-fiper2 Fixed interval period pulse generator. + - fsl,max-adj Maximum frequency adjustment in parts per billion. + + These properties set the operational parameters for the PTP + clock. You must choose these carefully for the clock to work right. + Here is how to figure good values: + + TimerOsc = selected reference clock MHz + tclk_period = desired clock period nanoseconds + NominalFreq = 1000 / tclk_period MHz + FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0) + tmr_add = ceil(2^32 / FreqDivRatio) + OutputClock = NominalFreq / tmr_prsc MHz + PulseWidth = 1 / OutputClock microseconds + FiperFreq1 = desired frequency in Hz + FiperDiv1 = 1000000 * OutputClock / FiperFreq1 + tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period + max_adj = 1000000000 * (FreqDivRatio - 1.0) - 1 + + The calculation for tmr_fiper2 is the same as for tmr_fiper1. The + driver expects that tmr_fiper1 will be correctly set to produce a 1 + Pulse Per Second (PPS) signal, since this will be offered to the PPS + subsystem to synchronize the Linux clock. + + Reference clock source is determined by the value, which is holded + in CKSEL bits in TMR_CTRL register. "fsl,cksel" property keeps the + value, which will be directly written in those bits, that is why, + according to reference manual, the next clock sources can be used: + + <0> - external high precision timer reference clock (TSEC_TMR_CLK + input is used for this purpose); + <1> - eTSEC system clock; + <2> - eTSEC1 transmit clock; + <3> - RTC clock input. + + When this attribute is not used, eTSEC system clock will serve as + IEEE 1588 timer reference clock. + +Example: + + ptp_clock@24e00 { + compatible = "fsl,etsec-ptp"; + reg = <0x24E00 0xB0>; + interrupts = <12 0x8 13 0x8>; + interrupt-parent = < &ipic >; + fsl,cksel = <1>; + fsl,tclk-period = <10>; + fsl,tmr-prsc = <100>; + fsl,tmr-add = <0x999999A4>; + fsl,tmr-fiper1 = <0x3B9AC9F6>; + fsl,tmr-fiper2 = <0x00018696>; + fsl,max-adj = <659999998>; + }; diff --git a/Documentation/devicetree/bindings/pwm/pwm-omap-dmtimer.txt b/Documentation/devicetree/bindings/pwm/pwm-omap-dmtimer.txt index 2e53324fb720db9b29db0baaf6bf878ce9b9fd0e..5ccfcc82da08347e9ed073c353ad2210731c4024 100644 --- a/Documentation/devicetree/bindings/pwm/pwm-omap-dmtimer.txt +++ b/Documentation/devicetree/bindings/pwm/pwm-omap-dmtimer.txt @@ -2,7 +2,7 @@ Required properties: - compatible: Shall contain "ti,omap-dmtimer-pwm". -- ti,timers: phandle to PWM capable OMAP timer. See arm/omap/timer.txt for info +- ti,timers: phandle to PWM capable OMAP timer. See timer/ti,timer.txt for info about these timers. - #pwm-cells: Should be 3. See pwm.txt in this directory for a description of the cells format. diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt b/Documentation/devicetree/bindings/regulator/pfuze100.txt index c6dd3f5e485b75cf0cab3543b43dfbada9c2b175..f0ada3b14d70d05e22f6da2be3e3b297c439412a 100644 --- a/Documentation/devicetree/bindings/regulator/pfuze100.txt +++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt @@ -21,7 +21,7 @@ Each regulator is defined using the standard binding for regulators. Example 1: PFUZE100 - pmic: pfuze100@8 { + pfuze100: pmic@8 { compatible = "fsl,pfuze100"; reg = <0x08>; @@ -122,7 +122,7 @@ Example 1: PFUZE100 Example 2: PFUZE200 - pmic: pfuze200@8 { + pfuze200: pmic@8 { compatible = "fsl,pfuze200"; reg = <0x08>; @@ -216,7 +216,7 @@ Example 2: PFUZE200 Example 3: PFUZE3000 - pmic: pfuze3000@8 { + pfuze3000: pmic@8 { compatible = "fsl,pfuze3000"; reg = <0x08>; diff --git a/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt b/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt index 57d2c65899df2cfdce7aafdc9ba3667094b1f784..406f2e570c50de2650c30fbba09d809efe0cb888 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt +++ b/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt @@ -110,6 +110,11 @@ Qualcomm SPMI Regulators Definition: Reference to regulator supplying the input pin, as described in the data sheet. +- qcom,saw-reg: + Usage: optional + Value type: + Description: Reference to syscon node defining the SAW registers. + The regulator node houses sub-nodes for each regulator within the device. Each sub-node is identified using the node's name, with valid values listed for each @@ -201,6 +206,17 @@ see regulator.txt - with additional custom properties described below: 2 = 0.55 uA 3 = 0.75 uA +- qcom,saw-slave: + Usage: optional + Value type: + Description: SAW controlled gang slave. Will not be configured. + +- qcom,saw-leader: + Usage: optional + Value type: + Description: SAW controlled gang leader. Will be configured as + SAW regulator. + Example: regulators { @@ -221,3 +237,32 @@ Example: .... }; + +Example 2: + + saw3: syscon@9A10000 { + compatible = "syscon"; + reg = <0x9A10000 0x1000>; + }; + + ... + + spm-regulators { + compatible = "qcom,pm8994-regulators"; + qcom,saw-reg = <&saw3>; + s8 { + qcom,saw-slave; + }; + s9 { + qcom,saw-slave; + }; + s10 { + qcom,saw-slave; + }; + pm8994_s11_saw: s11 { + qcom,saw-leader; + regulator-always-on; + regulator-min-microvolt = <900000>; + regulator-max-microvolt = <1140000>; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/regulator.txt b/Documentation/devicetree/bindings/regulator/regulator.txt index 2babe15b618d91d0f91d141da5bc0a77a37c5c48..a7cd36877bfe046905f5d8a4020985a0e6736c7e 100644 --- a/Documentation/devicetree/bindings/regulator/regulator.txt +++ b/Documentation/devicetree/bindings/regulator/regulator.txt @@ -59,6 +59,11 @@ Optional properties: - regulator-initial-mode: initial operating mode. The set of possible operating modes depends on the capabilities of every hardware so each device binding documentation explains which values the regulator supports. +- regulator-allowed-modes: list of operating modes that software is allowed to + configure for the regulator at run-time. Elements may be specified in any + order. The set of possible operating modes depends on the capabilities of + every hardware so each device binding document explains which values the + regulator supports. - regulator-system-load: Load in uA present on regulator that is not captured by any consumer request. - regulator-pull-down: Enable pull down resistor when the regulator is disabled. @@ -68,6 +73,11 @@ Optional properties: 0: Disable active discharge. 1: Enable active discharge. Absence of this property will leave configuration to default. +- regulator-coupled-with: Regulators with which the regulator + is coupled. The linkage is 2-way - all coupled regulators should be linked + with each other. A regulator should not be coupled with its supplier. +- regulator-coupled-max-spread: Max spread between voltages of coupled regulators + in microvolts. Deprecated properties: - regulator-compatible: If a regulator chip contains multiple diff --git a/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt new file mode 100644 index 0000000000000000000000000000000000000000..4edf3137d9f731c35fb47639f1fe8357560d907a --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/rohm,bd71837-regulator.txt @@ -0,0 +1,126 @@ +ROHM BD71837 Power Management Integrated Circuit (PMIC) regulator bindings + +BD71837MWV is a programmable Power Management +IC (PMIC) for powering single-core, dual-core, and +quad-core SoC’s such as NXP-i.MX 8M. It is optimized +for low BOM cost and compact solution footprint. It +integrates 8 Buck regulators and 7 LDO’s to provide all +the power rails required by the SoC and the commonly +used peripherals. + +Required properties: + - regulator-name: should be "buck1", ..., "buck8" and "ldo1", ..., "ldo7" + +List of regulators provided by this controller. BD71837 regulators node +should be sub node of the BD71837 MFD node. See BD71837 MFD bindings at +Documentation/devicetree/bindings/mfd/rohm,bd71837-pmic.txt +Regulator nodes should be named to BUCK_ and LDO_. The +definition for each of these nodes is defined using the standard +binding for regulators at +Documentation/devicetree/bindings/regulator/regulator.txt. +Note that if BD71837 starts at RUN state you probably want to use +regulator-boot-on at least for BUCK6 and BUCK7 so that those are not +disabled by driver at startup. LDO5 and LDO6 are supplied by those and +if they are disabled at startup the voltage monitoring for LDO5/LDO6 will +cause PMIC to reset. + +The valid names for regulator nodes are: +BUCK1, BUCK2, BUCK3, BUCK4, BUCK5, BUCK6, BUCK7, BUCK8 +LDO1, LDO2, LDO3, LDO4, LDO5, LDO6, LDO7 + +Optional properties: +- Any optional property defined in bindings/regulator/regulator.txt + +Example: +regulators { + buck1: BUCK1 { + regulator-name = "buck1"; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1300000>; + regulator-boot-on; + regulator-ramp-delay = <1250>; + }; + buck2: BUCK2 { + regulator-name = "buck2"; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1300000>; + regulator-boot-on; + regulator-always-on; + regulator-ramp-delay = <1250>; + }; + buck3: BUCK3 { + regulator-name = "buck3"; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1300000>; + regulator-boot-on; + }; + buck4: BUCK4 { + regulator-name = "buck4"; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1300000>; + regulator-boot-on; + }; + buck5: BUCK5 { + regulator-name = "buck5"; + regulator-min-microvolt = <700000>; + regulator-max-microvolt = <1350000>; + regulator-boot-on; + }; + buck6: BUCK6 { + regulator-name = "buck6"; + regulator-min-microvolt = <3000000>; + regulator-max-microvolt = <3300000>; + regulator-boot-on; + }; + buck7: BUCK7 { + regulator-name = "buck7"; + regulator-min-microvolt = <1605000>; + regulator-max-microvolt = <1995000>; + regulator-boot-on; + }; + buck8: BUCK8 { + regulator-name = "buck8"; + regulator-min-microvolt = <800000>; + regulator-max-microvolt = <1400000>; + }; + + ldo1: LDO1 { + regulator-name = "ldo1"; + regulator-min-microvolt = <3000000>; + regulator-max-microvolt = <3300000>; + regulator-boot-on; + }; + ldo2: LDO2 { + regulator-name = "ldo2"; + regulator-min-microvolt = <900000>; + regulator-max-microvolt = <900000>; + regulator-boot-on; + }; + ldo3: LDO3 { + regulator-name = "ldo3"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <3300000>; + }; + ldo4: LDO4 { + regulator-name = "ldo4"; + regulator-min-microvolt = <900000>; + regulator-max-microvolt = <1800000>; + }; + ldo5: LDO5 { + regulator-name = "ldo5"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <3300000>; + }; + ldo6: LDO6 { + regulator-name = "ldo6"; + regulator-min-microvolt = <900000>; + regulator-max-microvolt = <1800000>; + }; + ldo7_reg: LDO7 { + regulator-name = "ldo7"; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <3300000>; + }; +}; + + diff --git a/Documentation/devicetree/bindings/regulator/sy8106a-regulator.txt b/Documentation/devicetree/bindings/regulator/sy8106a-regulator.txt new file mode 100644 index 0000000000000000000000000000000000000000..39a8ca73f5721b8943c6711f863e81a69463ced3 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/sy8106a-regulator.txt @@ -0,0 +1,23 @@ +SY8106A Voltage regulator + +Required properties: +- compatible: Must be "silergy,sy8106a" +- reg: I2C slave address - must be <0x65> +- silergy,fixed-microvolt - the voltage when I2C regulating is disabled (set + by external resistor like a fixed voltage) + +Any property defined as part of the core regulator binding, defined in +./regulator.txt, can also be used. + +Example: + + sy8106a { + compatible = "silergy,sy8106a"; + reg = <0x65>; + regulator-name = "sy8106a-vdd"; + silergy,fixed-microvolt = <1200000>; + regulator-min-microvolt = <1000000>; + regulator-max-microvolt = <1400000>; + regulator-boot-on; + regulator-always-on; + }; diff --git a/Documentation/devicetree/bindings/regulator/tps65090.txt b/Documentation/devicetree/bindings/regulator/tps65090.txt index ca69f5e3040cfa48299682dd6371f99c90b49ffa..ae326f26359740bce4fe7ac119288447649b6429 100644 --- a/Documentation/devicetree/bindings/regulator/tps65090.txt +++ b/Documentation/devicetree/bindings/regulator/tps65090.txt @@ -16,7 +16,7 @@ Required properties: Optional properties: - ti,enable-ext-control: This is applicable for DCDC1, DCDC2 and DCDC3. If DCDCs are externally controlled then this property should be there. -- "dcdc-ext-control-gpios: This is applicable for DCDC1, DCDC2 and DCDC3. +- dcdc-ext-control-gpios: This is applicable for DCDC1, DCDC2 and DCDC3. If DCDCs are externally controlled and if it is from GPIO then GPIO number should be provided. If it is externally controlled and no GPIO entry then driver will just configure this rails as external control diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt index 00d3d58a102fe6da2ed0b70b7ce2d61772b17822..d9018242545044e6ac0cad6f7bbead4bdbbb87f6 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt +++ b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt @@ -11,6 +11,7 @@ on the Qualcomm Hexagon core. "qcom,msm8916-mss-pil", "qcom,msm8974-mss-pil" "qcom,msm8996-mss-pil" + "qcom,sdm845-mss-pil" - reg: Usage: required diff --git a/Documentation/devicetree/bindings/reserved-memory/qcom,cmd-db.txt b/Documentation/devicetree/bindings/reserved-memory/qcom,cmd-db.txt new file mode 100644 index 0000000000000000000000000000000000000000..68395530c0a5095cd184d8e344ab2b413b36b4b5 --- /dev/null +++ b/Documentation/devicetree/bindings/reserved-memory/qcom,cmd-db.txt @@ -0,0 +1,37 @@ +Command DB +--------- + +Command DB is a database that provides a mapping between resource key and the +resource address for a system resource managed by a remote processor. The data +is stored in a shared memory region and is loaded by the remote processor. + +Some of the Qualcomm Technologies Inc SoC's have hardware accelerators for +controlling shared resources. Depending on the board configuration the shared +resource properties may change. These properties are dynamically probed by the +remote processor and made available in the shared memory. + +The bindings for Command DB is specified in the reserved-memory section in +devicetree. The devicetree representation of the command DB driver should be: + +Properties: +- compatible: + Usage: required + Value type: + Definition: Should be "qcom,cmd-db" + +- reg: + Usage: required + Value type: + Definition: The register address that points to the actual location of + the Command DB in memory. + +Example: + + reserved-memory { + [...] + reserved-memory@85fe0000 { + reg = <0x0 0x85fe0000 0x0 0x20000>; + compatible = "qcom,cmd-db"; + no-map; + }; + }; diff --git a/Documentation/devicetree/bindings/reset/renesas,rst.txt b/Documentation/devicetree/bindings/reset/renesas,rst.txt index 294a0dae106a2623160b42dd1d947102095e5a37..67e83b02e10b63e33615e7f516618cfe238c84c3 100644 --- a/Documentation/devicetree/bindings/reset/renesas,rst.txt +++ b/Documentation/devicetree/bindings/reset/renesas,rst.txt @@ -17,6 +17,7 @@ Required properties: Examples with soctypes are: - "renesas,r8a7743-rst" (RZ/G1M) - "renesas,r8a7745-rst" (RZ/G1E) + - "renesas,r8a77470-rst" (RZ/G1C) - "renesas,r8a7778-reset-wdt" (R-Car M1A) - "renesas,r8a7779-reset-wdt" (R-Car H1) - "renesas,r8a7790-rst" (R-Car H2) @@ -29,6 +30,7 @@ Required properties: - "renesas,r8a77965-rst" (R-Car M3-N) - "renesas,r8a77970-rst" (R-Car V3M) - "renesas,r8a77980-rst" (R-Car V3H) + - "renesas,r8a77990-rst" (R-Car E3) - "renesas,r8a77995-rst" (R-Car D3) - reg: Address start and address range for the device. diff --git a/Documentation/devicetree/bindings/reset/st,sti-softreset.txt b/Documentation/devicetree/bindings/reset/st,sti-softreset.txt index a21658f18fe6d7d593adece10e056e072fa2c5e4..3661e6153a92bf8df66cea43d5f41415cc497786 100644 --- a/Documentation/devicetree/bindings/reset/st,sti-softreset.txt +++ b/Documentation/devicetree/bindings/reset/st,sti-softreset.txt @@ -15,7 +15,7 @@ Please refer to reset.txt in this directory for common reset controller binding usage. Required properties: -- compatible: Should be st,stih407-softreset"; +- compatible: Should be "st,stih407-softreset"; - #reset-cells: 1, see below example: diff --git a/Documentation/devicetree/bindings/rng/brcm,bcm2835.txt b/Documentation/devicetree/bindings/rng/brcm,bcm2835.txt index 627b29531a32f183435f28a9f5f7a383d7a2bbde..aaac7975f61c7b6f6cb8736a8362272e40dd8464 100644 --- a/Documentation/devicetree/bindings/rng/brcm,bcm2835.txt +++ b/Documentation/devicetree/bindings/rng/brcm,bcm2835.txt @@ -14,11 +14,16 @@ Optional properties: - clocks : phandle to clock-controller plus clock-specifier pair - clock-names : "ipsec" as a clock name +Optional properties: + +- interrupts: specify the interrupt for the RNG block + Example: rng { - compatible = "brcm,bcm2835-rng"; - reg = <0x7e104000 0x10>; + compatible = "brcm,bcm2835-rng"; + reg = <0x7e104000 0x10>; + interrupts = <2 29>; }; rng@18033000 { diff --git a/Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt b/Documentation/devicetree/bindings/rng/samsung,exynos4-rng.txt similarity index 100% rename from Documentation/devicetree/bindings/crypto/samsung,exynos-rng4.txt rename to Documentation/devicetree/bindings/rng/samsung,exynos4-rng.txt diff --git a/Documentation/devicetree/bindings/sparc_sun_oracle_rng.txt b/Documentation/devicetree/bindings/rng/sparc_sun_oracle_rng.txt similarity index 100% rename from Documentation/devicetree/bindings/sparc_sun_oracle_rng.txt rename to Documentation/devicetree/bindings/rng/sparc_sun_oracle_rng.txt diff --git a/Documentation/devicetree/bindings/rtc/nxp,rtc-2123.txt b/Documentation/devicetree/bindings/rtc/nxp,rtc-2123.txt index 5cbc0b145a619ed081ab2f762ba98222a3f498bb..811124a36d167dda3f8a873e859eb5baf69a99c5 100644 --- a/Documentation/devicetree/bindings/rtc/nxp,rtc-2123.txt +++ b/Documentation/devicetree/bindings/rtc/nxp,rtc-2123.txt @@ -9,7 +9,7 @@ Optional properties: Example: -rtc: nxp,rtc-pcf2123@3 { +pcf2123: rtc@3 { compatible = "nxp,rtc-pcf2123" reg = <3> spi-cs-high; diff --git a/Documentation/devicetree/bindings/rtc/st,stm32-rtc.txt b/Documentation/devicetree/bindings/rtc/st,stm32-rtc.txt index a66692a08acedd13b199bb7865542db6db20e425..c920e273699117ba4635e0dd2b09c7bd416d99ed 100644 --- a/Documentation/devicetree/bindings/rtc/st,stm32-rtc.txt +++ b/Documentation/devicetree/bindings/rtc/st,stm32-rtc.txt @@ -1,23 +1,29 @@ STM32 Real Time Clock Required properties: -- compatible: can be either "st,stm32-rtc" or "st,stm32h7-rtc", depending on - the device is compatible with stm32(f4/f7) or stm32h7. +- compatible: can be one of the following: + - "st,stm32-rtc" for devices compatible with stm32(f4/f7). + - "st,stm32h7-rtc" for devices compatible with stm32h7. + - "st,stm32mp1-rtc" for devices compatible with stm32mp1. - reg: address range of rtc register set. - clocks: can use up to two clocks, depending on part used: - "rtc_ck": RTC clock source. - It is required on stm32(f4/f7) and stm32h7. - "pclk": RTC APB interface clock. It is not present on stm32(f4/f7). - It is required on stm32h7. + It is required on stm32(h7/mp1). - clock-names: must be "rtc_ck" and "pclk". - It is required only on stm32h7. + It is required on stm32(h7/mp1). - interrupt-parent: phandle for the interrupt controller. -- interrupts: rtc alarm interrupt. -- st,syscfg: phandle for pwrcfg, mandatory to disable/enable backup domain - (RTC registers) write protection. + It is required on stm32(f4/f7/h7). +- interrupts: rtc alarm interrupt. On stm32mp1, a second interrupt is required + for rtc alarm wakeup interrupt. +- st,syscfg: phandle/offset/mask triplet. The phandle to pwrcfg used to + access control register at offset, and change the dbp (Disable Backup + Protection) bit represented by the mask, mandatory to disable/enable backup + domain (RTC registers) write protection. + It is required on stm32(f4/f7/h7). -Optional properties (to override default rtc_ck parent clock): +Optional properties (to override default rtc_ck parent clock on stm32(f4/f7/h7): - assigned-clocks: reference to the rtc_ck clock entry. - assigned-clock-parents: phandle of the new parent clock of rtc_ck. @@ -31,7 +37,7 @@ Example: assigned-clock-parents = <&rcc 1 CLK_LSE>; interrupt-parent = <&exti>; interrupts = <17 1>; - st,syscfg = <&pwrcfg>; + st,syscfg = <&pwrcfg 0x00 0x100>; }; rtc: rtc@58004000 { @@ -44,5 +50,14 @@ Example: interrupt-parent = <&exti>; interrupts = <17 1>; interrupt-names = "alarm"; - st,syscfg = <&pwrcfg>; + st,syscfg = <&pwrcfg 0x00 0x100>; + }; + + rtc: rtc@5c004000 { + compatible = "st,stm32mp1-rtc"; + reg = <0x5c004000 0x400>; + clocks = <&rcc RTCAPB>, <&rcc RTC>; + clock-names = "pclk", "rtc_ck"; + interrupts-extended = <&intc GIC_SPI 3 IRQ_TYPE_NONE>, + <&exti 19 1>; }; diff --git a/Documentation/devicetree/bindings/serial/microchip,pic32-uart.txt b/Documentation/devicetree/bindings/serial/microchip,pic32-uart.txt index 7a34345d0ca368e05cd036e01c58920313cb6f70..c8dd440e97470b3f3c237545d2b7ba879caae411 100644 --- a/Documentation/devicetree/bindings/serial/microchip,pic32-uart.txt +++ b/Documentation/devicetree/bindings/serial/microchip,pic32-uart.txt @@ -8,7 +8,7 @@ Required properties: See: Documentation/devicetree/bindings/clock/clock-bindings.txt - pinctrl-names: A pinctrl state names "default" must be defined. - pinctrl-0: Phandle referencing pin configuration of the UART peripheral. - See: Documentation/devicetree/bindings/pinctrl/pinctrl-binding.txt + See: Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt Optional properties: - cts-gpios: CTS pin for UART diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt new file mode 100644 index 0000000000000000000000000000000000000000..bcc612cc74232235e553072fddd7169e08f7794e --- /dev/null +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt @@ -0,0 +1,84 @@ +Qualcomm APR (Asynchronous Packet Router) binding + +This binding describes the Qualcomm APR. APR is a IPC protocol for +communication between Application processor and QDSP. APR is mainly +used for audio/voice services on the QDSP. + +- compatible: + Usage: required + Value type: + Definition: must be "qcom,apr-v", example "qcom,apr-v2" + +- reg + Usage: required + Value type: + Definition: Destination processor ID. + Possible values are : + 1 - APR simulator + 2 - PC + 3 - MODEM + 4 - ADSP + 5 - APPS + 6 - MODEM2 + 7 - APPS2 + += APR SERVICES +Each subnode of the APR node represents service tied to this apr. The name +of the nodes are not important. The properties of these nodes are defined +by the individual bindings for the specific service +- All APR services MUST contain the following property: + +- reg + Usage: required + Value type: + Definition: APR Service ID + Possible values are : + 3 - DSP Core Service + 4 - Audio Front End Service. + 5 - Voice Stream Manager Service. + 6 - Voice processing manager. + 7 - Audio Stream Manager Service. + 8 - Audio Device Manager Service. + 9 - Multimode voice manager. + 10 - Core voice stream. + 11 - Core voice processor. + 12 - Ultrasound stream manager. + 13 - Listen stream manager. + += EXAMPLE +The following example represents a QDSP based sound card on a MSM8996 device +which uses apr as communication between Apps and QDSP. + + apr@4 { + compatible = "qcom,apr-v2"; + reg = ; + + q6core@3 { + compatible = "qcom,q6core"; + reg = ; + }; + + q6afe@4 { + compatible = "qcom,q6afe"; + reg = ; + + dais { + #sound-dai-cells = <1>; + hdmi@1 { + reg = <1>; + }; + }; + }; + + q6asm@7 { + compatible = "qcom,q6asm"; + reg = ; + ... + }; + + q6adm@8 { + compatible = "qcom,q6adm"; + reg = ; + ... + }; + }; diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt new file mode 100644 index 0000000000000000000000000000000000000000..68b7d6207e3d75acd51400da27e5ca292c5026d0 --- /dev/null +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt @@ -0,0 +1,119 @@ +Qualcomm Technologies, Inc. GENI Serial Engine QUP Wrapper Controller + +Generic Interface (GENI) based Qualcomm Universal Peripheral (QUP) wrapper +is a programmable module for supporting a wide range of serial interfaces +like UART, SPI, I2C, I3C, etc. A single QUP module can provide upto 8 Serial +Interfaces, using its internal Serial Engines. The GENI Serial Engine QUP +Wrapper controller is modeled as a node with zero or more child nodes each +representing a serial engine. + +Required properties: +- compatible: Must be "qcom,geni-se-qup". +- reg: Must contain QUP register address and length. +- clock-names: Must contain "m-ahb" and "s-ahb". +- clocks: AHB clocks needed by the device. + +Required properties if child node exists: +- #address-cells: Must be <1> for Serial Engine Address +- #size-cells: Must be <1> for Serial Engine Address Size +- ranges: Must be present + +Properties for children: + +A GENI based QUP wrapper controller node can contain 0 or more child nodes +representing serial devices. These serial devices can be a QCOM UART, I2C +controller, SPI controller, or some combination of aforementioned devices. +Please refer below the child node definitions for the supported serial +interface protocols. + +Qualcomm Technologies Inc. GENI Serial Engine based I2C Controller + +Required properties: +- compatible: Must be "qcom,geni-i2c". +- reg: Must contain QUP register address and length. +- interrupts: Must contain I2C interrupt. +- clock-names: Must contain "se". +- clocks: Serial engine core clock needed by the device. +- #address-cells: Must be <1> for I2C device address. +- #size-cells: Must be <0> as I2C addresses have no size component. + +Optional property: +- clock-frequency: Desired I2C bus clock frequency in Hz. + When missing default to 100000Hz. + +Child nodes should conform to I2C bus binding as described in i2c.txt. + +Qualcomm Technologies Inc. GENI Serial Engine based UART Controller + +Required properties: +- compatible: Must be "qcom,geni-debug-uart". +- reg: Must contain UART register location and length. +- interrupts: Must contain UART core interrupts. +- clock-names: Must contain "se". +- clocks: Serial engine core clock needed by the device. + +Qualcomm Technologies Inc. GENI Serial Engine based SPI Controller + +Required properties: +- compatible: Must contain "qcom,geni-spi". +- reg: Must contain SPI register location and length. +- interrupts: Must contain SPI controller interrupts. +- clock-names: Must contain "se". +- clocks: Serial engine core clock needed by the device. +- spi-max-frequency: Specifies maximum SPI clock frequency, units - Hz. +- #address-cells: Must be <1> to define a chip select address on + the SPI bus. +- #size-cells: Must be <0>. + +SPI slave nodes must be children of the SPI master node and conform to SPI bus +binding as described in Documentation/devicetree/bindings/spi/spi-bus.txt. + +Example: + geniqup@8c0000 { + compatible = "qcom,geni-se-qup"; + reg = <0x8c0000 0x6000>; + clock-names = "m-ahb", "s-ahb"; + clocks = <&clock_gcc GCC_QUPV3_WRAP_0_M_AHB_CLK>, + <&clock_gcc GCC_QUPV3_WRAP_0_S_AHB_CLK>; + #address-cells = <1>; + #size-cells = <1>; + ranges; + + i2c0: i2c@a94000 { + compatible = "qcom,geni-i2c"; + reg = <0xa94000 0x4000>; + interrupts = ; + clock-names = "se"; + clocks = <&clock_gcc GCC_QUPV3_WRAP0_S5_CLK>; + pinctrl-names = "default", "sleep"; + pinctrl-0 = <&qup_1_i2c_5_active>; + pinctrl-1 = <&qup_1_i2c_5_sleep>; + #address-cells = <1>; + #size-cells = <0>; + }; + + uart0: serial@a88000 { + compatible = "qcom,geni-debug-uart"; + reg = <0xa88000 0x7000>; + interrupts = ; + clock-names = "se"; + clocks = <&clock_gcc GCC_QUPV3_WRAP0_S0_CLK>; + pinctrl-names = "default", "sleep"; + pinctrl-0 = <&qup_1_uart_3_active>; + pinctrl-1 = <&qup_1_uart_3_sleep>; + }; + + spi0: spi@a84000 { + compatible = "qcom,geni-spi"; + reg = <0xa84000 0x4000>; + interrupts = ; + clock-names = "se"; + clocks = <&clock_gcc GCC_QUPV3_WRAP0_S0_CLK>; + pinctrl-names = "default", "sleep"; + pinctrl-0 = <&qup_1_spi_2_active>; + pinctrl-1 = <&qup_1_spi_2_sleep>; + spi-max-frequency = <19200000>; + #address-cells = <1>; + #size-cells = <0>; + }; + } diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt index a48049ccf6d0dd4cf332609d7a7b67326c58cf27..89e1cb9212f6899a51851ee7488bd6b73cc24db6 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt @@ -22,6 +22,7 @@ resources. "qcom,rpm-apq8084" "qcom,rpm-msm8916" "qcom,rpm-msm8974" + "qcom,rpm-msm8998" - qcom,smd-channels: Usage: required diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt index ea1dc75ec9eaca099589114a3c13322c1d193625..234ae2256501b7511976ffe6272d34a5c327c769 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt @@ -22,9 +22,15 @@ The edge is described by the following properties: Definition: should specify the IRQ used by the remote processor to signal this processor about communication related updates -- qcom,ipc: +- mboxes: Usage: required Value type: + Definition: reference to the associated doorbell in APCS, as described + in mailbox/mailbox.txt + +- qcom,ipc: + Usage: required, unless mboxes is specified + Value type: Definition: three entries specifying the outgoing ipc bit used for signaling the remote processor: - phandle to a syscon node representing the apcs registers diff --git a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt index 301d2a9bc1b8bb2ab0e213e6656b03691c2c71e8..5d49d0a2ff29df92201d62f1171eeaec5e5f9d22 100644 --- a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt +++ b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt @@ -5,6 +5,10 @@ powered up/down by software based on different application scenes to save power. Required properties for power domain controller: - compatible: Should be one of the following. + "rockchip,px30-power-controller" - for PX30 SoCs. + "rockchip,rk3036-power-controller" - for RK3036 SoCs. + "rockchip,rk3128-power-controller" - for RK3128 SoCs. + "rockchip,rk3228-power-controller" - for RK3228 SoCs. "rockchip,rk3288-power-controller" - for RK3288 SoCs. "rockchip,rk3328-power-controller" - for RK3328 SoCs. "rockchip,rk3366-power-controller" - for RK3366 SoCs. @@ -17,6 +21,10 @@ Required properties for power domain controller: Required properties for power domain sub nodes: - reg: index of the power domain, should use macros in: + "include/dt-bindings/power/px30-power.h" - for PX30 type power domain. + "include/dt-bindings/power/rk3036-power.h" - for RK3036 type power domain. + "include/dt-bindings/power/rk3128-power.h" - for RK3128 type power domain. + "include/dt-bindings/power/rk3228-power.h" - for RK3228 type power domain. "include/dt-bindings/power/rk3288-power.h" - for RK3288 type power domain. "include/dt-bindings/power/rk3328-power.h" - for RK3328 type power domain. "include/dt-bindings/power/rk3366-power.h" - for RK3366 type power domain. @@ -93,6 +101,10 @@ Node of a device using power domains must have a power-domains property, containing a phandle to the power device node and an index specifying which power domain to use. The index should use macros in: + "include/dt-bindings/power/px30-power.h" - for px30 type power domain. + "include/dt-bindings/power/rk3036-power.h" - for rk3036 type power domain. + "include/dt-bindings/power/rk3128-power.h" - for rk3128 type power domain. + "include/dt-bindings/power/rk3128-power.h" - for rk3228 type power domain. "include/dt-bindings/power/rk3288-power.h" - for rk3288 type power domain. "include/dt-bindings/power/rk3328-power.h" - for rk3328 type power domain. "include/dt-bindings/power/rk3366-power.h" - for rk3366 type power domain. diff --git a/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt b/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt index 77cd42cc5f54897b60371af222d62f847e97326f..b025770eeb928876748ec9c10ab4cb6abb0a305f 100644 --- a/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt +++ b/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt @@ -17,7 +17,8 @@ pool management. Required properties: -- compatible : Must be "ti,keystone-navigator-qmss"; +- compatible : Must be "ti,keystone-navigator-qmss". + : Must be "ti,66ak2g-navss-qm" for QMSS on K2G SoC. - clocks : phandle to the reference clock for this device. - queue-range : total range of queue numbers for the device. - linkram0 :
for internal link ram, where size is the total @@ -39,6 +40,12 @@ Required properties: - Descriptor memory setup region. - Queue Management/Queue Proxy region for queue Push. - Queue Management/Queue Proxy region for queue Pop. + +For QMSS on K2G SoC, following QM reg indexes are used in that order + - Queue Peek region. + - Queue configuration region. + - Queue Management/Queue Proxy region for queue Push/Pop. + - queue-pools : child node classifying the queue ranges into pools. Queue ranges are grouped into 3 type of pools: - qpend : pool of qpend(interruptible) queues diff --git a/Documentation/devicetree/bindings/sound/adi,ssm2305.txt b/Documentation/devicetree/bindings/sound/adi,ssm2305.txt new file mode 100644 index 0000000000000000000000000000000000000000..a9c9d83c8a30af344545234adebcc6236f513291 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/adi,ssm2305.txt @@ -0,0 +1,14 @@ +Analog Devices SSM2305 Speaker Amplifier +======================================== + +Required properties: + - compatible : "adi,ssm2305" + - shutdown-gpios : The gpio connected to the shutdown pin. + The gpio signal is ACTIVE_LOW. + +Example: + +ssm2305: analog-amplifier { + compatible = "adi,ssm2305"; + shutdown-gpios = <&gpio3 20 GPIO_ACTIVE_LOW>; +}; diff --git a/Documentation/devicetree/bindings/sound/atmel-i2s.txt b/Documentation/devicetree/bindings/sound/atmel-i2s.txt new file mode 100644 index 0000000000000000000000000000000000000000..735368b8a73fe5149fbb13f03741fecf6dad4094 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/atmel-i2s.txt @@ -0,0 +1,47 @@ +* Atmel I2S controller + +Required properties: +- compatible: Should be "atmel,sama5d2-i2s". +- reg: Should be the physical base address of the controller and the + length of memory mapped region. +- interrupts: Should contain the interrupt for the controller. +- dmas: Should be one per channel name listed in the dma-names property, + as described in atmel-dma.txt and dma.txt files. +- dma-names: Two dmas have to be defined, "tx" and "rx". + This IP also supports one shared channel for both rx and tx; + if this mode is used, one "rx-tx" name must be used. +- clocks: Must contain an entry for each entry in clock-names. + Please refer to clock-bindings.txt. +- clock-names: Should be one of each entry matching the clocks phandles list: + - "pclk" (peripheral clock) Required. + - "gclk" (generated clock) Optional (1). + - "aclk" (Audio PLL clock) Optional (1). + - "muxclk" (I2S mux clock) Optional (1). + +Optional properties: +- pinctrl-0: Should specify pin control groups used for this controller. +- princtrl-names: Should contain only one value - "default". + + +(1) : Only the peripheral clock is required. The generated clock, the Audio + PLL clock adn the I2S mux clock are optional and should only be set + together, when Master Mode is required. + +Example: + + i2s@f8050000 { + compatible = "atmel,sama5d2-i2s"; + reg = <0xf8050000 0x300>; + interrupts = <54 IRQ_TYPE_LEVEL_HIGH 7>; + dmas = <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(31))>, + <&dma0 + (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1) | + AT91_XDMAC_DT_PERID(32))>; + dma-names = "tx", "rx"; + clocks = <&i2s0_clk>, <&i2s0_gclk>, <&audio_pll_pmc>, <&i2s0muxck>; + clock-names = "pclk", "gclk", "aclk", "muxclk"; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_i2s0_default>; + }; diff --git a/Documentation/devicetree/bindings/sound/cs42xx8.txt b/Documentation/devicetree/bindings/sound/cs42xx8.txt index f631fbca6284325378e959683d737fc787035a17..8619a156d038bdd76e342351fe83af63b4105746 100644 --- a/Documentation/devicetree/bindings/sound/cs42xx8.txt +++ b/Documentation/devicetree/bindings/sound/cs42xx8.txt @@ -16,7 +16,7 @@ Required properties: Example: -codec: cs42888@48 { +cs42888: codec@48 { compatible = "cirrus,cs42888"; reg = <0x48>; clocks = <&codec_mclk 0>; diff --git a/Documentation/devicetree/bindings/sound/fsl,asrc.txt b/Documentation/devicetree/bindings/sound/fsl,asrc.txt index f5a14115b459482ea7ffa2ce43c9b84096cdb3df..1d4d9f938689f39b5214a730991ac068aec02f35 100644 --- a/Documentation/devicetree/bindings/sound/fsl,asrc.txt +++ b/Documentation/devicetree/bindings/sound/fsl,asrc.txt @@ -31,14 +31,16 @@ Required properties: it. This property is optional depending on the SoC design. - - big-endian : If this property is absent, the little endian mode - will be in use as default. Otherwise, the big endian - mode will be in use for all the device registers. - - fsl,asrc-rate : Defines a mutual sample rate used by DPCM Back Ends. - fsl,asrc-width : Defines a mutual sample width used by DPCM Back Ends. +Optional properties: + + - big-endian : If this property is absent, the little endian mode + will be in use as default. Otherwise, the big endian + mode will be in use for all the device registers. + Example: asrc: asrc@2034000 { diff --git a/Documentation/devicetree/bindings/sound/fsl,esai.txt b/Documentation/devicetree/bindings/sound/fsl,esai.txt index cacd18bb9ba63978a73d569a00db21d6389b9b21..5b9914367610315d8aaab3a614dd318de9408c6d 100644 --- a/Documentation/devicetree/bindings/sound/fsl,esai.txt +++ b/Documentation/devicetree/bindings/sound/fsl,esai.txt @@ -42,6 +42,8 @@ Required properties: means all the settings for Receiving would be duplicated from Transmition related registers. +Optional properties: + - big-endian : If this property is absent, the native endian mode will be in use as default, or the big endian mode will be in use for all the device registers. diff --git a/Documentation/devicetree/bindings/sound/fsl,spdif.txt b/Documentation/devicetree/bindings/sound/fsl,spdif.txt index 38cfa75734415e018107c6a0a2cc128497cc059f..8b324f82a7828013c06a1688c80c7316558c47ce 100644 --- a/Documentation/devicetree/bindings/sound/fsl,spdif.txt +++ b/Documentation/devicetree/bindings/sound/fsl,spdif.txt @@ -33,6 +33,8 @@ Required properties: it. This property is optional depending on the SoC design. +Optional properties: + - big-endian : If this property is absent, the native endian mode will be in use as default, or the big endian mode will be in use for all the device registers. diff --git a/Documentation/devicetree/bindings/sound/fsl-sai.txt b/Documentation/devicetree/bindings/sound/fsl-sai.txt index 740b467adf7d1d46f318d5a2a6bf74f35cfa80bc..dd9e59738e0820b4db57b737e34a37c11c90fd55 100644 --- a/Documentation/devicetree/bindings/sound/fsl-sai.txt +++ b/Documentation/devicetree/bindings/sound/fsl-sai.txt @@ -28,9 +28,6 @@ Required properties: pinctrl-names. See ../pinctrl/pinctrl-bindings.txt for details of the property values. - - big-endian : Boolean property, required if all the FTM_PWM - registers are big-endian rather than little-endian. - - lsb-first : Configures whether the LSB or the MSB is transmitted first for the fifo data. If this property is absent, the MSB is transmitted first as default, or the LSB @@ -48,6 +45,11 @@ Required properties: receive data by following their own bit clocks and frame sync clocks separately. +Optional properties: + + - big-endian : Boolean property, required if all the SAI + registers are big-endian rather than little-endian. + Optional properties (for mx6ul): - fsl,sai-mclk-direction-output: This is a boolean property. If present, diff --git a/Documentation/devicetree/bindings/sound/mt2701-afe-pcm.txt b/Documentation/devicetree/bindings/sound/mt2701-afe-pcm.txt index e2f7f49512155e1084c85ce9e3dd6b9770285a97..560762e0a1681b37ef1a66750e14474868a137c8 100644 --- a/Documentation/devicetree/bindings/sound/mt2701-afe-pcm.txt +++ b/Documentation/devicetree/bindings/sound/mt2701-afe-pcm.txt @@ -1,7 +1,9 @@ Mediatek AFE PCM controller for mt2701 Required properties: -- compatible = "mediatek,mt2701-audio"; +- compatible: should be one of the followings. + - "mediatek,mt2701-audio" + - "mediatek,mt7622-audio" - interrupts: should contain AFE and ASYS interrupts - interrupt-names: should be "afe" and "asys" - power-domains: should define the power domain diff --git a/Documentation/devicetree/bindings/sound/mt6351.txt b/Documentation/devicetree/bindings/sound/mt6351.txt new file mode 100644 index 0000000000000000000000000000000000000000..7fb2cb99245ed5e966cf915c84f58fec1802f5bb --- /dev/null +++ b/Documentation/devicetree/bindings/sound/mt6351.txt @@ -0,0 +1,16 @@ +Mediatek MT6351 Audio Codec + +The communication between MT6351 and SoC is through Mediatek PMIC wrapper. +For more detail, please visit Mediatek PMIC wrapper documentation. + +Must be a child node of PMIC wrapper. + +Required properties: + +- compatible : "mediatek,mt6351-sound". + +Example: + +mt6351_snd { + compatible = "mediatek,mt6351-sound"; +}; diff --git a/Documentation/devicetree/bindings/sound/mt6797-afe-pcm.txt b/Documentation/devicetree/bindings/sound/mt6797-afe-pcm.txt new file mode 100644 index 0000000000000000000000000000000000000000..0ae29de15bfd6419f389a01c79753a6a999dcfa4 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/mt6797-afe-pcm.txt @@ -0,0 +1,42 @@ +Mediatek AFE PCM controller for mt6797 + +Required properties: +- compatible = "mediatek,mt6797-audio"; +- reg: register location and size +- interrupts: should contain AFE interrupt +- power-domains: should define the power domain +- clocks: Must contain an entry for each entry in clock-names +- clock-names: should have these clock names: + "infra_sys_audio_clk", + "infra_sys_audio_26m", + "mtkaif_26m_clk", + "top_mux_audio", + "top_mux_aud_intbus", + "top_sys_pll3_d4", + "top_sys_pll1_d4", + "top_clk26m_clk"; + +Example: + + afe: mt6797-afe-pcm@11220000 { + compatible = "mediatek,mt6797-audio"; + reg = <0 0x11220000 0 0x1000>; + interrupts = ; + power-domains = <&scpsys MT6797_POWER_DOMAIN_AUDIO>; + clocks = <&infrasys CLK_INFRA_AUDIO>, + <&infrasys CLK_INFRA_AUDIO_26M>, + <&infrasys CLK_INFRA_AUDIO_26M_PAD_TOP>, + <&topckgen CLK_TOP_MUX_AUDIO>, + <&topckgen CLK_TOP_MUX_AUD_INTBUS>, + <&topckgen CLK_TOP_SYSPLL3_D4>, + <&topckgen CLK_TOP_SYSPLL1_D4>, + <&clk26m>; + clock-names = "infra_sys_audio_clk", + "infra_sys_audio_26m", + "mtkaif_26m_clk", + "top_mux_audio", + "top_mux_aud_intbus", + "top_sys_pll3_d4", + "top_sys_pll1_d4", + "top_clk26m_clk"; + }; diff --git a/Documentation/devicetree/bindings/sound/mt6797-mt6351.txt b/Documentation/devicetree/bindings/sound/mt6797-mt6351.txt new file mode 100644 index 0000000000000000000000000000000000000000..1d95a8840f196be1cb0e7e314fc482b0e150cb0c --- /dev/null +++ b/Documentation/devicetree/bindings/sound/mt6797-mt6351.txt @@ -0,0 +1,14 @@ +MT6797 with MT6351 CODEC + +Required properties: +- compatible: "mediatek,mt6797-mt6351-sound" +- mediatek,platform: the phandle of MT6797 ASoC platform +- mediatek,audio-codec: the phandles of MT6351 codec + +Example: + + sound { + compatible = "mediatek,mt6797-mt6351-sound"; + mediatek,audio-codec = <&mt6351_snd>; + mediatek,platform = <&afe>; + }; diff --git a/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt b/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt index 6a4aadc4ce06b27ff059c64f6c438d0fef863b21..84b28dbe9f15452bbe341f3dbf5e6f5452b72a19 100644 --- a/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt +++ b/Documentation/devicetree/bindings/sound/qcom,apq8016-sbc.txt @@ -30,7 +30,7 @@ Required properties: Board connectors: * Headset Mic - * Secondary Mic", + * Secondary Mic * DMIC * Ext Spk diff --git a/Documentation/devicetree/bindings/sound/qcom,apq8096.txt b/Documentation/devicetree/bindings/sound/qcom,apq8096.txt new file mode 100644 index 0000000000000000000000000000000000000000..c7600a93ab39e58bb62cc02e1f77a2d5132f1b08 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/qcom,apq8096.txt @@ -0,0 +1,109 @@ +* Qualcomm Technologies APQ8096 ASoC sound card driver + +This binding describes the APQ8096 sound card, which uses qdsp for audio. + +- compatible: + Usage: required + Value type: + Definition: must be "qcom,apq8096-sndcard" + +- qcom,audio-routing: + Usage: Optional + Value type: + Definition: A list of the connections between audio components. + Each entry is a pair of strings, the first being the + connection's sink, the second being the connection's + source. Valid names could be power supplies, MicBias + of codec and the jacks on the board: + Valid names include: + + Board Connectors: + "Headphone Left" + "Headphone Right" + "Earphone" + "Line Out1" + "Line Out2" + "Line Out3" + "Line Out4" + "Analog Mic1" + "Analog Mic2" + "Analog Mic3" + "Analog Mic4" + "Analog Mic5" + "Analog Mic6" + "Digital Mic2" + "Digital Mic3" + + Audio pins and MicBias on WCD9335 Codec: + "MIC_BIAS1" + "MIC_BIAS2" + "MIC_BIAS3" + "MIC_BIAS4" + "AMIC1" + "AMIC2" + "AMIC3" + "AMIC4" + "AMIC5" + "AMIC6" + "AMIC6" + "DMIC1" + "DMIC2" + "DMIC3" += dailinks +Each subnode of sndcard represents either a dailink, and subnodes of each +dailinks would be cpu/codec/platform dais. + +- link-name: + Usage: required + Value type: + Definition: User friendly name for dai link + += CPU, PLATFORM, CODEC dais subnodes +- cpu: + Usage: required + Value type: + Definition: cpu dai sub-node + +- codec: + Usage: Optional + Value type: + Definition: codec dai sub-node + +- platform: + Usage: Optional + Value type: + Definition: platform dai sub-node + +- sound-dai: + Usage: required + Value type: + Definition: dai phandle/s and port of CPU/CODEC/PLATFORM node. + +Example: + +audio { + compatible = "qcom,apq8096-sndcard"; + qcom,model = "DB820c"; + + mm1-dai-link { + link-name = "MultiMedia1"; + cpu { + sound-dai = <&q6asmdai MSM_FRONTEND_DAI_MULTIMEDIA1>; + }; + }; + + hdmi-dai-link { + link-name = "HDMI Playback"; + cpu { + sound-dai = <&q6afe HDMI_RX>; + }; + + platform { + sound-dai = <&q6adm>; + }; + + codec { + sound-dai = <&hdmi 0>; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/sound/qcom,q6adm.txt b/Documentation/devicetree/bindings/sound/qcom,q6adm.txt new file mode 100644 index 0000000000000000000000000000000000000000..cb709e5dbc44fb0238c143acf14acdb87169b05b --- /dev/null +++ b/Documentation/devicetree/bindings/sound/qcom,q6adm.txt @@ -0,0 +1,33 @@ +Qualcomm Audio Device Manager (Q6ADM) binding + +Q6ADM is one of the APR audio service on Q6DSP. +Please refer to qcom,apr.txt for details of the coommon apr service bindings +used by the apr service device. + +- but must contain the following property: + +- compatible: + Usage: required + Value type: + Definition: must be "qcom,q6adm-v.". + Or "qcom,q6adm" where the version number can be queried + from DSP. + example "qcom,q6adm-v2.0" + + += ADM routing +"routing" subnode of the ADM node represents adm routing specific configuration + +- #sound-dai-cells + Usage: required + Value type: + Definition: Must be 0 + += EXAMPLE +q6adm@8 { + compatible = "qcom,q6adm"; + reg = ; + q6routing: routing { + #sound-dai-cells = <0>; + }; +}; diff --git a/Documentation/devicetree/bindings/sound/qcom,q6afe.txt b/Documentation/devicetree/bindings/sound/qcom,q6afe.txt new file mode 100644 index 0000000000000000000000000000000000000000..bdbf87df8c0be3933604ff177607b3ee314c3d70 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/qcom,q6afe.txt @@ -0,0 +1,172 @@ +Qualcomm Audio Front End (Q6AFE) binding + +AFE is one of the APR audio service on Q6DSP +Please refer to qcom,apr.txt for details of the common apr service bindings +used by all apr services. Must contain the following properties. + +- compatible: + Usage: required + Value type: + Definition: must be "qcom,q6afe-v." + Or "qcom,q6afe" where the version number can be queried + from DSP. + example "qcom,q6afe" + += AFE DAIs (Digial Audio Interface) +"dais" subnode of the AFE node. It represents afe dais, each afe dai is a +subnode of "dais" representing board specific dai setup. +"dais" node should have following properties followed by dai children. + +- #sound-dai-cells + Usage: required + Value type: + Definition: Must be 1 + +- #address-cells + Usage: required + Value type: + Definition: Must be 1 + +- #size-cells + Usage: required + Value type: + Definition: Must be 0 + +== AFE DAI is subnode of "dais" and represent a dai, it includes board specific +configuration of each dai. Must contain the following properties. + +- reg + Usage: required + Value type: + Definition: Must be dai id + +- qcom,sd-lines + Usage: required for mi2s interface + Value type: + Definition: Must be list of serial data lines used by this dai. + should be one or more of the 1-4 sd lines. + + - qcom,tdm-sync-mode: + Usage: required for tdm interface + Value type: + Definition: Synchronization mode. + 0 - Short sync bit mode + 1 - Long sync mode + 2 - Short sync slot mode + + - qcom,tdm-sync-src: + Usage: required for tdm interface + Value type: + Definition: Synchronization source. + 0 - External source + 1 - Internal source + + - qcom,tdm-data-out: + Usage: required for tdm interface + Value type: + Definition: Data out signal to drive with other masters. + 0 - Disable + 1 - Enable + + - qcom,tdm-invert-sync: + Usage: required for tdm interface + Value type: + Definition: Invert the sync. + 0 - Normal + 1 - Invert + + - qcom,tdm-data-delay: + Usage: required for tdm interface + Value type: + Definition: Number of bit clock to delay data + with respect to sync edge. + 0 - 0 bit clock cycle + 1 - 1 bit clock cycle + 2 - 2 bit clock cycle + + - qcom,tdm-data-align: + Usage: required for tdm interface + Value type: + Definition: Indicate how data is packed + within the slot. For example, 32 slot width in case of + sample bit width is 24. + 0 - MSB + 1 - LSB + += EXAMPLE + +q6afe@4 { + compatible = "qcom,q6afe"; + reg = ; + + dais { + #sound-dai-cells = <1>; + #address-cells = <1>; + #size-cells = <0>; + + hdmi@1 { + reg = <1>; + }; + + tdm@24 { + reg = <24>; + qcom,tdm-sync-mode = <1>: + qcom,tdm-sync-src = <1>; + qcom,tdm-data-out = <0>; + qcom,tdm-invert-sync = <1>; + qcom,tdm-data-delay = <1>; + qcom,tdm-data-align = <0>; + + }; + + tdm@25 { + reg = <25>; + qcom,tdm-sync-mode = <1>: + qcom,tdm-sync-src = <1>; + qcom,tdm-data-out = <0>; + qcom,tdm-invert-sync = <1>; + qcom,tdm-data-delay <1>: + qcom,tdm-data-align = <0>; + }; + + prim-mi2s-rx@16 { + reg = <16>; + qcom,sd-lines = <1 3>; + }; + + prim-mi2s-tx@17 { + reg = <17>; + qcom,sd-lines = <2>; + }; + + sec-mi2s-rx@18 { + reg = <18>; + qcom,sd-lines = <1 4>; + }; + + sec-mi2s-tx@19 { + reg = <19>; + qcom,sd-lines = <2>; + }; + + tert-mi2s-rx@20 { + reg = <20>; + qcom,sd-lines = <2 4>; + }; + + tert-mi2s-tx@21 { + reg = <21>; + qcom,sd-lines = <1>; + }; + + quat-mi2s-rx@22 { + reg = <22>; + qcom,sd-lines = <1>; + }; + + quat-mi2s-tx@23 { + reg = <23>; + qcom,sd-lines = <2>; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/sound/qcom,q6asm.txt b/Documentation/devicetree/bindings/sound/qcom,q6asm.txt new file mode 100644 index 0000000000000000000000000000000000000000..2178eb91146f84a8ade34294f535df5240c6b6c5 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/qcom,q6asm.txt @@ -0,0 +1,33 @@ +Qualcomm Audio Stream Manager (Q6ASM) binding + +Q6ASM is one of the APR audio service on Q6DSP. +Please refer to qcom,apr.txt for details of the common apr service bindings +used by the apr service device. + +- but must contain the following property: + +- compatible: + Usage: required + Value type: + Definition: must be "qcom,q6asm-v.". + Or "qcom,q6asm" where the version number can be queried + from DSP. + example "qcom,q6asm-v2.0" + += ASM DAIs (Digial Audio Interface) +"dais" subnode of the ASM node represents dai specific configuration + +- #sound-dai-cells + Usage: required + Value type: + Definition: Must be 1 + += EXAMPLE + +q6asm@7 { + compatible = "qcom,q6asm"; + reg = ; + q6asmdai: dais { + #sound-dai-cells = <1>; + }; +}; diff --git a/Documentation/devicetree/bindings/sound/qcom,q6core.txt b/Documentation/devicetree/bindings/sound/qcom,q6core.txt new file mode 100644 index 0000000000000000000000000000000000000000..7f36ff8bec1817c679315da642e6de10989c0eec --- /dev/null +++ b/Documentation/devicetree/bindings/sound/qcom,q6core.txt @@ -0,0 +1,21 @@ +Qualcomm ADSP Core service binding + +Q6CORE is one of the APR audio service on Q6DSP. +Please refer to qcom,apr.txt for details of the common apr service bindings +used by the apr service device. + +- but must contain the following property: + +- compatible: + Usage: required + Value type: + Definition: must be "qcom,q6core-v.". + Or "qcom,q6core" where the version number can be queried + from DSP. + example "qcom,q6core-v2.0" + += EXAMPLE +q6core@3 { + compatible = "qcom,q6core"; + reg = ; +}; diff --git a/Documentation/devicetree/bindings/sound/rt274.txt b/Documentation/devicetree/bindings/sound/rt274.txt index e9a6178c78cfe09c56a303495ec4d207ed53b11b..791a1bd767b9166e9330bf9a0fc4f90dfcfc62b2 100644 --- a/Documentation/devicetree/bindings/sound/rt274.txt +++ b/Documentation/devicetree/bindings/sound/rt274.txt @@ -26,7 +26,7 @@ Pins on the device (for linking into audio routes) for RT274: Example: -codec: rt274@1c { +rt274: codec@1c { compatible = "realtek,rt274"; reg = <0x1c>; interrupts = <7 IRQ_TYPE_EDGE_FALLING>; diff --git a/Documentation/devicetree/bindings/sound/rt5514.txt b/Documentation/devicetree/bindings/sound/rt5514.txt index 4f33b0d96afeb106bb457ed071ab3f4779d80ec5..b25ed08c7a5a0113761c18a17d3ceef50d0d6d96 100644 --- a/Documentation/devicetree/bindings/sound/rt5514.txt +++ b/Documentation/devicetree/bindings/sound/rt5514.txt @@ -32,7 +32,7 @@ Pins on the device (for linking into audio routes) for I2C: Example: -codec: rt5514@57 { +rt5514: codec@57 { compatible = "realtek,rt5514"; reg = <0x57>; }; diff --git a/Documentation/devicetree/bindings/sound/rt5616.txt b/Documentation/devicetree/bindings/sound/rt5616.txt index e410858185596a0fb74ec5812cef3dd3e6282762..540a4bf252e48eb4dcce34061b0b9ee537ab0d23 100644 --- a/Documentation/devicetree/bindings/sound/rt5616.txt +++ b/Documentation/devicetree/bindings/sound/rt5616.txt @@ -26,7 +26,7 @@ Pins on the device (for linking into audio routes) for RT5616: Example: -codec: rt5616@1b { +rt5616: codec@1b { compatible = "realtek,rt5616"; reg = <0x1b>; }; diff --git a/Documentation/devicetree/bindings/sound/rt5640.txt b/Documentation/devicetree/bindings/sound/rt5640.txt index 57fe64643050a5e9f41072dd0d4ae0376f5d6958..e40e4893eed8014352cb81ab86c42f21da38e123 100644 --- a/Documentation/devicetree/bindings/sound/rt5640.txt +++ b/Documentation/devicetree/bindings/sound/rt5640.txt @@ -22,6 +22,41 @@ Optional properties: - realtek,ldo1-en-gpios : The GPIO that controls the CODEC's LDO1_EN pin. +- realtek,dmic1-data-pin + 0: dmic1 is not used + 1: using IN1P pin as dmic1 data pin + 2: using GPIO3 pin as dmic1 data pin + +- realtek,dmic2-data-pin + 0: dmic2 is not used + 1: using IN1N pin as dmic2 data pin + 2: using GPIO4 pin as dmic2 data pin + +- realtek,jack-detect-source + u32. Valid values: + 0: jack-detect is not used + 1: Use GPIO1 for jack-detect + 2: Use JD1_IN4P for jack-detect + 3: Use JD2_IN4N for jack-detect + 4: Use GPIO2 for jack-detect + 5: Use GPIO3 for jack-detect + 6: Use GPIO4 for jack-detect + +- realtek,jack-detect-not-inverted + bool. Normal jack-detect switches give an inverted signal, set this bool + in the rare case you've a jack-detect switch which is not inverted. + +- realtek,over-current-threshold-microamp + u32, micbias over-current detection threshold in µA, valid values are + 600, 1500 and 2000µA. + +- realtek,over-current-scale-factor + u32, micbias over-current detection scale-factor, valid values are: + 0: Scale current by 0.5 + 1: Scale current by 0.75 + 2: Scale current by 1.0 + 3: Scale current by 1.5 + Pins on the device (for linking into audio routes) for RT5639/RT5640: * DMIC1 diff --git a/Documentation/devicetree/bindings/sound/rt5645.txt b/Documentation/devicetree/bindings/sound/rt5645.txt index 7cee1f518f59fc7ec2fccc1882a939b2a2589a37..a03f9a872a71667ddc2ab9197ece633ee337af96 100644 --- a/Documentation/devicetree/bindings/sound/rt5645.txt +++ b/Documentation/devicetree/bindings/sound/rt5645.txt @@ -69,4 +69,4 @@ codec: rt5650@1a { realtek,dmic-en = "true"; realtek,en-jd-func = "true"; realtek,jd-mode = <3>; -}; \ No newline at end of file +}; diff --git a/Documentation/devicetree/bindings/sound/rt5651.txt b/Documentation/devicetree/bindings/sound/rt5651.txt index b85221864cec8a607f62dc27c41f4f37e05b941e..a41199a5cd79b5225b2b732bacfef782d9286d54 100644 --- a/Documentation/devicetree/bindings/sound/rt5651.txt +++ b/Documentation/devicetree/bindings/sound/rt5651.txt @@ -50,7 +50,7 @@ Pins on the device (for linking into audio routes) for RT5651: Example: -codec: rt5651@1a { +rt5651: codec@1a { compatible = "realtek,rt5651"; reg = <0x1a>; realtek,dmic-en = "true"; diff --git a/Documentation/devicetree/bindings/sound/rt5663.txt b/Documentation/devicetree/bindings/sound/rt5663.txt index 497bcfc58b712173b05ce876600f1595a3d20125..23386446c63d6fde5b63c35b8badd989a8ada56b 100644 --- a/Documentation/devicetree/bindings/sound/rt5663.txt +++ b/Documentation/devicetree/bindings/sound/rt5663.txt @@ -47,7 +47,7 @@ Pins on the device (for linking into audio routes) for RT5663: Example: -codec: rt5663@12 { +rt5663: codec@12 { compatible = "realtek,rt5663"; reg = <0x12>; interrupts = <7 IRQ_TYPE_EDGE_FALLING>; diff --git a/Documentation/devicetree/bindings/sound/rt5668.txt b/Documentation/devicetree/bindings/sound/rt5668.txt new file mode 100644 index 0000000000000000000000000000000000000000..c88b96e7764b78ae850d3a451cfb40ece40d361c --- /dev/null +++ b/Documentation/devicetree/bindings/sound/rt5668.txt @@ -0,0 +1,50 @@ +RT5668B audio CODEC + +This device supports I2C only. + +Required properties: + +- compatible : "realtek,rt5668b" + +- reg : The I2C address of the device. + +Optional properties: + +- interrupts : The CODEC's interrupt output. + +- realtek,dmic1-data-pin + 0: dmic1 is not used + 1: using GPIO2 pin as dmic1 data pin + 2: using GPIO5 pin as dmic1 data pin + +- realtek,dmic1-clk-pin + 0: using GPIO1 pin as dmic1 clock pin + 1: using GPIO3 pin as dmic1 clock pin + +- realtek,jd-src + 0: No JD is used + 1: using JD1 as JD source + +- realtek,ldo1-en-gpios : The GPIO that controls the CODEC's LDO1_EN pin. + +Pins on the device (for linking into audio routes) for RT5668B: + + * DMIC L1 + * DMIC R1 + * IN1P + * HPOL + * HPOR + +Example: + +rt5668 { + compatible = "realtek,rt5668b"; + reg = <0x1a>; + interrupt-parent = <&gpio>; + interrupts = ; + realtek,ldo1-en-gpios = + <&gpio TEGRA_GPIO(R, 2) GPIO_ACTIVE_HIGH>; + realtek,dmic1-data-pin = <1>; + realtek,dmic1-clk-pin = <1>; + realtek,jd-src = <1>; +}; diff --git a/Documentation/devicetree/bindings/sound/sgtl5000.txt b/Documentation/devicetree/bindings/sound/sgtl5000.txt index 9a36c7e2a1434ad60bb61436d5da9df29d05f8f3..0f214457476f2aa399925720ee8b42c934b89568 100644 --- a/Documentation/devicetree/bindings/sound/sgtl5000.txt +++ b/Documentation/devicetree/bindings/sound/sgtl5000.txt @@ -39,7 +39,7 @@ VDDIO 1.8V 2.5V 3.3V Example: -codec: sgtl5000@a { +sgtl5000: codec@a { compatible = "fsl,sgtl5000"; reg = <0x0a>; #sound-dai-cells = <0>; diff --git a/Documentation/devicetree/bindings/sound/simple-card.txt b/Documentation/devicetree/bindings/sound/simple-card.txt index 17c13e74667da8d2135fa3360926909b6dbc569b..a4c72d09cd45718e6ce5197d6f8fce1e782deeac 100644 --- a/Documentation/devicetree/bindings/sound/simple-card.txt +++ b/Documentation/devicetree/bindings/sound/simple-card.txt @@ -86,6 +86,11 @@ Optional CPU/CODEC subnodes properties: in dai startup() and disabled with clk_disable_unprepare() in dai shutdown(). + If a clock is specified and a + multiplication factor is given with + mclk-fs, the clock will be set to the + calculated mclk frequency when the + stream starts. - system-clock-direction-out : specifies clock direction as 'out' on initialization. It is useful for some aCPUs with fixed clocks. diff --git a/Documentation/devicetree/bindings/sound/st,stm32-i2s.txt b/Documentation/devicetree/bindings/sound/st,stm32-i2s.txt index 4bda52042402600cfdab48c466e84d35d2f5bea4..58c341300552578018be27280517d36a24da35e5 100644 --- a/Documentation/devicetree/bindings/sound/st,stm32-i2s.txt +++ b/Documentation/devicetree/bindings/sound/st,stm32-i2s.txt @@ -18,7 +18,7 @@ Required properties: See Documentation/devicetree/bindings/dma/stm32-dma.txt. - dma-names: Identifier for each DMA request line. Must be "tx" and "rx". - pinctrl-names: should contain only value "default" - - pinctrl-0: see Documentation/devicetree/bindings/pinctrl/pinctrl-stm32.txt + - pinctrl-0: see Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.txt Optional properties: - resets: Reference to a reset controller asserting the reset controller diff --git a/Documentation/devicetree/bindings/sound/st,stm32-sai.txt b/Documentation/devicetree/bindings/sound/st,stm32-sai.txt index f301cdf0b7e68942172253f077a93c8abb923b95..3a3fc506e43ae8ce727e03f14f0073a10008adfa 100644 --- a/Documentation/devicetree/bindings/sound/st,stm32-sai.txt +++ b/Documentation/devicetree/bindings/sound/st,stm32-sai.txt @@ -37,7 +37,7 @@ SAI subnodes required properties: "tx": if sai sub-block is configured as playback DAI "rx": if sai sub-block is configured as capture DAI - pinctrl-names: should contain only value "default" - - pinctrl-0: see Documentation/devicetree/bindings/pinctrl/pinctrl-stm32.txt + - pinctrl-0: see Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.txt SAI subnodes Optional properties: - st,sync: specify synchronization mode. diff --git a/Documentation/devicetree/bindings/sound/ti,tas6424.txt b/Documentation/devicetree/bindings/sound/ti,tas6424.txt index 1c4ada0eef4e1613d2a88f79e3d572c65e64b1e9..eacb54f34188fa6cf10f3ad101b1c70bf590a488 100644 --- a/Documentation/devicetree/bindings/sound/ti,tas6424.txt +++ b/Documentation/devicetree/bindings/sound/ti,tas6424.txt @@ -6,6 +6,8 @@ Required properties: - compatible: "ti,tas6424" - TAS6424 - reg: I2C slave address - sound-dai-cells: must be equal to 0 + - standby-gpios: GPIO used to shut the TAS6424 down. + - mute-gpios: GPIO used to mute all the outputs Example: diff --git a/Documentation/devicetree/bindings/sound/tscs42xx.txt b/Documentation/devicetree/bindings/sound/tscs42xx.txt index 2ac2f099669761354469f9550700c00b61e7c91e..7eea32e9d07838aec8d7f30b4e88b39c69da24d3 100644 --- a/Documentation/devicetree/bindings/sound/tscs42xx.txt +++ b/Documentation/devicetree/bindings/sound/tscs42xx.txt @@ -8,9 +8,15 @@ Required Properties: - reg : <0x71> for analog mic <0x69> for digital mic + - clock-names: Must one of the following "mclk1", "xtal", "mclk2" + + - clocks: phandle of the clock that provides the codec sysclk + Example: wookie: codec@69 { compatible = "tempo,tscs42A2"; reg = <0x69>; + clock-names = "xtal"; + clocks = <&audio_xtal>; }; diff --git a/Documentation/devicetree/bindings/sound/tscs454.txt b/Documentation/devicetree/bindings/sound/tscs454.txt new file mode 100644 index 0000000000000000000000000000000000000000..3ba3e2d2c20666c5f3cf7651ca117462eac7faf7 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/tscs454.txt @@ -0,0 +1,23 @@ +TSCS454 Audio CODEC + +Required Properties: + + - compatible : "tempo,tscs454" + + - reg : <0x69> + + - clock-names: Must one of the following "xtal", "mclk1", "mclk2" + + - clocks: phandle of the clock that provides the codec sysclk + + Note: If clock is not provided then bit clock is assumed + +Example: + +redwood: codec@69 { + #sound-dai-cells = <1>; + compatible = "tempo,tscs454"; + reg = <0x69>; + clock-names = "mclk1"; + clocks = <&audio_mclk>; +}; diff --git a/Documentation/devicetree/bindings/sound/wm8510.txt b/Documentation/devicetree/bindings/sound/wm8510.txt index fa1a32b85577db40322c4ee9655a9da77d983261..e6b6cc041f895b21d22cfc8a7e73277d2716fff1 100644 --- a/Documentation/devicetree/bindings/sound/wm8510.txt +++ b/Documentation/devicetree/bindings/sound/wm8510.txt @@ -12,7 +12,7 @@ Required properties: Example: -codec: wm8510@1a { +wm8510: codec@1a { compatible = "wlf,wm8510"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8523.txt b/Documentation/devicetree/bindings/sound/wm8523.txt index 04746186b2831558c7e9f3b27589cbf4d13475da..f3a6485f4b8abf1b149c944d1f3486936f6529e0 100644 --- a/Documentation/devicetree/bindings/sound/wm8523.txt +++ b/Documentation/devicetree/bindings/sound/wm8523.txt @@ -10,7 +10,7 @@ Required properties: Example: -codec: wm8523@1a { +wm8523: codec@1a { compatible = "wlf,wm8523"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8524.txt b/Documentation/devicetree/bindings/sound/wm8524.txt index 0f0553563fc1c265ea759d1c7348ad1c71f97c57..f6c0c263b135add1539054bb3beee8414da11c63 100644 --- a/Documentation/devicetree/bindings/sound/wm8524.txt +++ b/Documentation/devicetree/bindings/sound/wm8524.txt @@ -10,7 +10,7 @@ Required properties: Example: -codec: wm8524 { +wm8524: codec { compatible = "wlf,wm8524"; wlf,mute-gpios = <&gpio1 8 GPIO_ACTIVE_LOW>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8580.txt b/Documentation/devicetree/bindings/sound/wm8580.txt index 78fce9b14954ef76f9726b98486674d35b4ae934..ff3f9f5f2111f7afc832fcf6b280468dad843eb1 100644 --- a/Documentation/devicetree/bindings/sound/wm8580.txt +++ b/Documentation/devicetree/bindings/sound/wm8580.txt @@ -10,7 +10,7 @@ Required properties: Example: -codec: wm8580@1a { +wm8580: codec@1a { compatible = "wlf,wm8580"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8711.txt b/Documentation/devicetree/bindings/sound/wm8711.txt index 8ed9998cd23c0ad1f0357152f09b4360d52ec74a..c30a1387c4bff4f08a081d1a711b43d6f8e2a53e 100644 --- a/Documentation/devicetree/bindings/sound/wm8711.txt +++ b/Documentation/devicetree/bindings/sound/wm8711.txt @@ -12,7 +12,7 @@ Required properties: Example: -codec: wm8711@1a { +wm8711: codec@1a { compatible = "wlf,wm8711"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8728.txt b/Documentation/devicetree/bindings/sound/wm8728.txt index a8b5c3668e602537faf0bf1189fcd006286af3a4..a3608b4c78b95ce2a3998de2df43e6a3d8fbc4d5 100644 --- a/Documentation/devicetree/bindings/sound/wm8728.txt +++ b/Documentation/devicetree/bindings/sound/wm8728.txt @@ -12,7 +12,7 @@ Required properties: Example: -codec: wm8728@1a { +wm8728: codec@1a { compatible = "wlf,wm8728"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8731.txt b/Documentation/devicetree/bindings/sound/wm8731.txt index 236690e99b87fbc81c0cb5b1823981cf1e87ff4d..f660d9bb0e698eb0a9d141541709071ff745b418 100644 --- a/Documentation/devicetree/bindings/sound/wm8731.txt +++ b/Documentation/devicetree/bindings/sound/wm8731.txt @@ -12,7 +12,7 @@ Required properties: Example: -codec: wm8731@1a { +wm8731: codec@1a { compatible = "wlf,wm8731"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8737.txt b/Documentation/devicetree/bindings/sound/wm8737.txt index 4bc2cea3b14085cf4b95c62065db7bcdce4e771c..eda1ec6a7563a04a4b83bc3fafea57d91b4ec9a4 100644 --- a/Documentation/devicetree/bindings/sound/wm8737.txt +++ b/Documentation/devicetree/bindings/sound/wm8737.txt @@ -12,7 +12,7 @@ Required properties: Example: -codec: wm8737@1a { +wm8737: codec@1a { compatible = "wlf,wm8737"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8741.txt b/Documentation/devicetree/bindings/sound/wm8741.txt index a133154087194cb90a2056ba6f927d67b6e49ba4..b69e196c741c4f8d13b051e16b06c904e64c25f1 100644 --- a/Documentation/devicetree/bindings/sound/wm8741.txt +++ b/Documentation/devicetree/bindings/sound/wm8741.txt @@ -21,7 +21,7 @@ Optional properties: Example: -codec: wm8741@1a { +wm8741: codec@1a { compatible = "wlf,wm8741"; reg = <0x1a>; diff --git a/Documentation/devicetree/bindings/sound/wm8750.txt b/Documentation/devicetree/bindings/sound/wm8750.txt index 8db239fd5ecde354e559e56ef8f32194336a1586..682f221f6f38e3615d043070a26585916e096dc7 100644 --- a/Documentation/devicetree/bindings/sound/wm8750.txt +++ b/Documentation/devicetree/bindings/sound/wm8750.txt @@ -12,7 +12,7 @@ Required properties: Example: -codec: wm8750@1a { +wm8750: codec@1a { compatible = "wlf,wm8750"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8753.txt b/Documentation/devicetree/bindings/sound/wm8753.txt index 8eee6128210552e0d8decfedeffbc0b216a9ec21..eca9e5a825a96e5ff17f682a83ffeef7f33a896b 100644 --- a/Documentation/devicetree/bindings/sound/wm8753.txt +++ b/Documentation/devicetree/bindings/sound/wm8753.txt @@ -34,7 +34,7 @@ Pins on the device (for linking into audio routes): Example: -codec: wm8753@1a { +wm8753: codec@1a { compatible = "wlf,wm8753"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8770.txt b/Documentation/devicetree/bindings/sound/wm8770.txt index 866e00ca150b4b04e573e204a3ad39a0c057a58a..cac762a1105ded99ebe0966d2c4f0c67bbba7dc6 100644 --- a/Documentation/devicetree/bindings/sound/wm8770.txt +++ b/Documentation/devicetree/bindings/sound/wm8770.txt @@ -10,7 +10,7 @@ Required properties: Example: -codec: wm8770@1 { +wm8770: codec@1 { compatible = "wlf,wm8770"; reg = <1>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8776.txt b/Documentation/devicetree/bindings/sound/wm8776.txt index 3b9ca49abc2b146a8bacafb0e182b8773b6cb4c3..01173369c3ed300a89dd86c288757ddfae0e37b7 100644 --- a/Documentation/devicetree/bindings/sound/wm8776.txt +++ b/Documentation/devicetree/bindings/sound/wm8776.txt @@ -12,7 +12,7 @@ Required properties: Example: -codec: wm8776@1a { +wm8776: codec@1a { compatible = "wlf,wm8776"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8804.txt b/Documentation/devicetree/bindings/sound/wm8804.txt index 6fd124b1649652dfaac67bf0efbf35f0bade58e4..2c1641c17a91efc97c0a6194724da559849721cb 100644 --- a/Documentation/devicetree/bindings/sound/wm8804.txt +++ b/Documentation/devicetree/bindings/sound/wm8804.txt @@ -19,7 +19,7 @@ Optional properties: Example: -codec: wm8804@1a { +wm8804: codec@1a { compatible = "wlf,wm8804"; reg = <0x1a>; }; diff --git a/Documentation/devicetree/bindings/sound/wm8903.txt b/Documentation/devicetree/bindings/sound/wm8903.txt index afc51caf113705e4da112635385c205f290442ac..6371c2434afe75f0dbfd7228f1b905879fc0288d 100644 --- a/Documentation/devicetree/bindings/sound/wm8903.txt +++ b/Documentation/devicetree/bindings/sound/wm8903.txt @@ -57,7 +57,7 @@ Pins on the device (for linking into audio routes): Example: -codec: wm8903@1a { +wm8903: codec@1a { compatible = "wlf,wm8903"; reg = <0x1a>; interrupts = < 347 >; diff --git a/Documentation/devicetree/bindings/sound/wm8960.txt b/Documentation/devicetree/bindings/sound/wm8960.txt index 2deb8a3da9c55662882f55abdd5da719ca5c5f6a..6d29ac3750eeea29da05a361b527580e65d6f76f 100644 --- a/Documentation/devicetree/bindings/sound/wm8960.txt +++ b/Documentation/devicetree/bindings/sound/wm8960.txt @@ -23,7 +23,7 @@ Optional properties: Example: -codec: wm8960@1a { +wm8960: codec@1a { compatible = "wlf,wm8960"; reg = <0x1a>; diff --git a/Documentation/devicetree/bindings/sound/wm8962.txt b/Documentation/devicetree/bindings/sound/wm8962.txt index 7f82b59ec8f947f336ec6515d88839cbe7c3bb54..dcfa9a3369fde640da518de0f61466a1a54235c4 100644 --- a/Documentation/devicetree/bindings/sound/wm8962.txt +++ b/Documentation/devicetree/bindings/sound/wm8962.txt @@ -24,7 +24,7 @@ Optional properties: Example: -codec: wm8962@1a { +wm8962: codec@1a { compatible = "wlf,wm8962"; reg = <0x1a>; diff --git a/Documentation/devicetree/bindings/sound/wm8994.txt b/Documentation/devicetree/bindings/sound/wm8994.txt index 68c4e8d96bed68c4c3d0ea09b10cbe8cb8170515..4a9dead1b7d32e67b6dd51c081574c4f59b3bb86 100644 --- a/Documentation/devicetree/bindings/sound/wm8994.txt +++ b/Documentation/devicetree/bindings/sound/wm8994.txt @@ -59,7 +59,7 @@ Optional properties: Example: -codec: wm8994@1a { +wm8994: codec@1a { compatible = "wlf,wm8994"; reg = <0x1a>; diff --git a/Documentation/devicetree/bindings/spi/spi-st-ssc.txt b/Documentation/devicetree/bindings/spi/spi-st-ssc.txt index fe54959ec95774407471885f89e9aacc2cb59b20..1bdc4709e4743c5599d790e09ff68295c05f9fa0 100644 --- a/Documentation/devicetree/bindings/spi/spi-st-ssc.txt +++ b/Documentation/devicetree/bindings/spi/spi-st-ssc.txt @@ -9,7 +9,7 @@ Required properties: - clocks : Must contain an entry for each name in clock-names See ../clk/* - pinctrl-names : Uses "default", can use "sleep" if provided - See ../pinctrl/pinctrl-binding.txt + See ../pinctrl/pinctrl-bindings.txt Optional properties: - cs-gpios : List of GPIO chip selects diff --git a/Documentation/devicetree/bindings/submitting-patches.txt b/Documentation/devicetree/bindings/submitting-patches.txt index 274058c583dde67297f800087c9e2ecaf7a57eac..de0d6090c0fd200f241aa0c71b5d208c62bde45c 100644 --- a/Documentation/devicetree/bindings/submitting-patches.txt +++ b/Documentation/devicetree/bindings/submitting-patches.txt @@ -6,7 +6,14 @@ I. For patch submitters 0) Normal patch submission rules from Documentation/process/submitting-patches.rst applies. - 1) The Documentation/ portion of the patch should be a separate patch. + 1) The Documentation/ and include/dt-bindings/ portion of the patch should + be a separate patch. The preferred subject prefix for binding patches is: + + "dt-bindings: : ..." + + The 80 characters of the subject are precious. It is recommended to not + use "Documentation" or "doc" because that is implied. All bindings are + docs. Repeating "binding" again should also be avoided. 2) Submit the entire series to the devicetree mailinglist at diff --git a/Documentation/devicetree/bindings/thermal/exynos-thermal.txt b/Documentation/devicetree/bindings/thermal/exynos-thermal.txt index b957acff57aa448071e0deda91a3e11eb78ca184..ad648d93d9613fd3c72d70496d47618d1ee18e0a 100644 --- a/Documentation/devicetree/bindings/thermal/exynos-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/exynos-thermal.txt @@ -12,7 +12,6 @@ "samsung,exynos5420-tmu-ext-triminfo" for TMU channels 2, 3 and 4 Exynos5420 (Must pass triminfo base and triminfo clock) "samsung,exynos5433-tmu" - "samsung,exynos5440-tmu" "samsung,exynos7-tmu" - interrupt-parent : The phandle for the interrupt controller - reg : Address range of the thermal registers. For soc's which has multiple @@ -68,18 +67,7 @@ Example 1): #thermal-sensor-cells = <0>; }; -Example 2): - - tmuctrl_0: tmuctrl@160118 { - compatible = "samsung,exynos5440-tmu"; - reg = <0x160118 0x230>, <0x160368 0x10>; - interrupts = <0 58 0>; - clocks = <&clock 21>; - clock-names = "tmu_apbif"; - #thermal-sensor-cells = <0>; - }; - -Example 3): (In case of Exynos5420 "with misplaced TRIMINFO register") +Example 2): (In case of Exynos5420 "with misplaced TRIMINFO register") tmu_cpu2: tmu@10068000 { compatible = "samsung,exynos5420-tmu-ext-triminfo"; reg = <0x10068000 0x100>, <0x1006c000 0x4>; diff --git a/Documentation/devicetree/bindings/thermal/imx-thermal.txt b/Documentation/devicetree/bindings/thermal/imx-thermal.txt index 379eb763073e68d4a63b540c6ca3384a3b09f642..823e4176eef8f7399eff8e9a8ded841c54bd8d4c 100644 --- a/Documentation/devicetree/bindings/thermal/imx-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/imx-thermal.txt @@ -1,8 +1,13 @@ * Temperature Monitor (TEMPMON) on Freescale i.MX SoCs Required properties: -- compatible : "fsl,imx6q-tempmon" for i.MX6Q, "fsl,imx6sx-tempmon" for i.MX6SX. - i.MX6SX has two more IRQs than i.MX6Q, one is IRQ_LOW and the other is IRQ_PANIC, +- compatible : must be one of following: + - "fsl,imx6q-tempmon" for i.MX6Q, + - "fsl,imx6sx-tempmon" for i.MX6SX, + - "fsl,imx7d-tempmon" for i.MX7S/D. +- interrupts : the interrupt output of the controller: + i.MX6Q has one IRQ which will be triggered when temperature is higher than high threshold, + i.MX6SX and i.MX7S/D have two more IRQs than i.MX6Q, one is IRQ_LOW and the other is IRQ_PANIC, when temperature is below than low threshold, IRQ_LOW will be triggered, when temperature is higher than panic threshold, system will auto reboot by SRC module. - fsl,tempmon : phandle pointer to system controller that contains TEMPMON diff --git a/Documentation/devicetree/bindings/thermal/mediatek-thermal.txt b/Documentation/devicetree/bindings/thermal/mediatek-thermal.txt index 0d73ea5e9c0c41da00f34d98fcb86ea5d27e200b..41d6a443ad660a1e969e765b19ec253e5f04530d 100644 --- a/Documentation/devicetree/bindings/thermal/mediatek-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/mediatek-thermal.txt @@ -12,6 +12,7 @@ Required properties: - "mediatek,mt8173-thermal" : For MT8173 family of SoCs - "mediatek,mt2701-thermal" : For MT2701 family of SoCs - "mediatek,mt2712-thermal" : For MT2712 family of SoCs + - "mediatek,mt7622-thermal" : For MT7622 SoC - reg: Address range of the thermal controller - interrupts: IRQ for the thermal controller - clocks, clock-names: Clocks needed for the thermal controller. required diff --git a/Documentation/devicetree/bindings/thermal/qcom-tsens.txt b/Documentation/devicetree/bindings/thermal/qcom-tsens.txt index 292ed89d900bbc6d19d6357c799b083fe0fcf9a2..06195e8f35e287b870920ecb4f4dfed799625231 100644 --- a/Documentation/devicetree/bindings/thermal/qcom-tsens.txt +++ b/Documentation/devicetree/bindings/thermal/qcom-tsens.txt @@ -8,6 +8,7 @@ Required properties: - reg: Address range of the thermal registers - #thermal-sensor-cells : Should be 1. See ./thermal.txt for a description. +- #qcom,sensors: Number of sensors in tsens block - Refer to Documentation/devicetree/bindings/nvmem/nvmem.txt to know how to specify nvmem cells diff --git a/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt b/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt index fdf5caa6229b4fffe95853e061e9c62c83307364..cfa154bb0fa7c7ec5977de4c38c92a7ebd596e7e 100644 --- a/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/rcar-gen3-thermal.txt @@ -9,6 +9,7 @@ Required properties: Examples with soctypes are: - "renesas,r8a7795-thermal" (R-Car H3) - "renesas,r8a7796-thermal" (R-Car M3-W) + - "renesas,r8a77965-thermal" (R-Car M3-N) - reg : Address ranges of the thermal registers. Each sensor needs one address range. Sorting must be done in increasing order according to datasheet, i.e. @@ -18,7 +19,7 @@ Required properties: Optional properties: -- interrupts : interrupts routed to the TSC (3 for H3 and M3-W) +- interrupts : interrupts routed to the TSC (3 for H3, M3-W and M3-N) - power-domain : Must contain a reference to the power domain. This property is mandatory if the thermal sensor instance is part of a controllable power domain. @@ -27,9 +28,9 @@ Example: tsc: thermal@e6198000 { compatible = "renesas,r8a7795-thermal"; - reg = <0 0xe6198000 0 0x68>, - <0 0xe61a0000 0 0x5c>, - <0 0xe61a8000 0 0x5c>; + reg = <0 0xe6198000 0 0x100>, + <0 0xe61a0000 0 0x100>, + <0 0xe61a8000 0 0x100>; interrupts = , , ; diff --git a/Documentation/devicetree/bindings/thermal/rcar-thermal.txt b/Documentation/devicetree/bindings/thermal/rcar-thermal.txt index 349e635f2d87e0a8421bcaaea1ef18e12b73dff2..67c563f1b4c42131157e8428300c5a12d39bbc2f 100644 --- a/Documentation/devicetree/bindings/thermal/rcar-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/rcar-thermal.txt @@ -3,7 +3,8 @@ Required properties: - compatible : "renesas,thermal-", "renesas,rcar-gen2-thermal" (with thermal-zone) or - "renesas,rcar-thermal" (without thermal-zone) as fallback. + "renesas,rcar-thermal" (without thermal-zone) as + fallback except R-Car D3. Examples with soctypes are: - "renesas,thermal-r8a73a4" (R-Mobile APE6) - "renesas,thermal-r8a7743" (RZ/G1M) @@ -12,13 +13,15 @@ Required properties: - "renesas,thermal-r8a7791" (R-Car M2-W) - "renesas,thermal-r8a7792" (R-Car V2H) - "renesas,thermal-r8a7793" (R-Car M2-N) + - "renesas,thermal-r8a77995" (R-Car D3) - reg : Address range of the thermal registers. The 1st reg will be recognized as common register if it has "interrupts". Option properties: -- interrupts : use interrupt +- interrupts : If present should contain 3 interrupts for + R-Car D3 or 1 interrupt otherwise. Example (non interrupt support): diff --git a/Documentation/devicetree/bindings/thermal/uniphier-thermal.txt b/Documentation/devicetree/bindings/thermal/uniphier-thermal.txt index 686c0b42ed3f6c7debe89e2a25d129226bd262b8..ceb92a95727a8aadb79dbfe9e222ced65449d6c7 100644 --- a/Documentation/devicetree/bindings/thermal/uniphier-thermal.txt +++ b/Documentation/devicetree/bindings/thermal/uniphier-thermal.txt @@ -8,6 +8,7 @@ Required properties: - compatible : - "socionext,uniphier-pxs2-thermal" : For UniPhier PXs2 SoC - "socionext,uniphier-ld20-thermal" : For UniPhier LD20 SoC + - "socionext,uniphier-pxs3-thermal" : For UniPhier PXs3 SoC - interrupts : IRQ for the temperature alarm - #thermal-sensor-cells : Should be 0. See ./thermal.txt for details. diff --git a/Documentation/devicetree/bindings/nios2/timer.txt b/Documentation/devicetree/bindings/timer/altr,timer-1.0.txt similarity index 100% rename from Documentation/devicetree/bindings/nios2/timer.txt rename to Documentation/devicetree/bindings/timer/altr,timer-1.0.txt diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt b/Documentation/devicetree/bindings/timer/arm,arch_timer.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/arch_timer.txt rename to Documentation/devicetree/bindings/timer/arm,arch_timer.txt diff --git a/Documentation/devicetree/bindings/arm/armv7m_systick.txt b/Documentation/devicetree/bindings/timer/arm,armv7m-systick.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/armv7m_systick.txt rename to Documentation/devicetree/bindings/timer/arm,armv7m-systick.txt diff --git a/Documentation/devicetree/bindings/arm/global_timer.txt b/Documentation/devicetree/bindings/timer/arm,global_timer.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/global_timer.txt rename to Documentation/devicetree/bindings/timer/arm,global_timer.txt diff --git a/Documentation/devicetree/bindings/arm/twd.txt b/Documentation/devicetree/bindings/timer/arm,twd.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/twd.txt rename to Documentation/devicetree/bindings/timer/arm,twd.txt diff --git a/Documentation/devicetree/bindings/powerpc/fsl/gtm.txt b/Documentation/devicetree/bindings/timer/fsl,gtm.txt similarity index 100% rename from Documentation/devicetree/bindings/powerpc/fsl/gtm.txt rename to Documentation/devicetree/bindings/timer/fsl,gtm.txt diff --git a/Documentation/devicetree/bindings/arm/mrvl/timer.txt b/Documentation/devicetree/bindings/timer/mrvl,mmp-timer.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/mrvl/timer.txt rename to Documentation/devicetree/bindings/timer/mrvl,mmp-timer.txt diff --git a/Documentation/devicetree/bindings/arm/msm/timer.txt b/Documentation/devicetree/bindings/timer/qcom,msm-timer.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/msm/timer.txt rename to Documentation/devicetree/bindings/timer/qcom,msm-timer.txt diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt b/Documentation/devicetree/bindings/timer/renesas,cmt.txt index d740989eb56981cbce66e2cd6930481d452f1f97..b40add2d9bb485ecd3ea395669da1e43e9e31a1e 100644 --- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt +++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt @@ -22,6 +22,10 @@ Required Properties: - "renesas,r8a73a4-cmt0" for the 32-bit CMT0 device included in r8a73a4. - "renesas,r8a73a4-cmt1" for the 48-bit CMT1 device included in r8a73a4. + - "renesas,r8a7743-cmt0" for the 32-bit CMT0 device included in r8a7743. + - "renesas,r8a7743-cmt1" for the 48-bit CMT1 device included in r8a7743. + - "renesas,r8a7745-cmt0" for the 32-bit CMT0 device included in r8a7745. + - "renesas,r8a7745-cmt1" for the 48-bit CMT1 device included in r8a7745. - "renesas,r8a7790-cmt0" for the 32-bit CMT0 device included in r8a7790. - "renesas,r8a7790-cmt1" for the 48-bit CMT1 device included in r8a7790. - "renesas,r8a7791-cmt0" for the 32-bit CMT0 device included in r8a7791. @@ -31,10 +35,12 @@ Required Properties: - "renesas,r8a7794-cmt0" for the 32-bit CMT0 device included in r8a7794. - "renesas,r8a7794-cmt1" for the 48-bit CMT1 device included in r8a7794. - - "renesas,rcar-gen2-cmt0" for 32-bit CMT0 devices included in R-Car Gen2. - - "renesas,rcar-gen2-cmt1" for 48-bit CMT1 devices included in R-Car Gen2. - These are fallbacks for r8a73a4 and all the R-Car Gen2 - entries listed above. + - "renesas,rcar-gen2-cmt0" for 32-bit CMT0 devices included in R-Car Gen2 + and RZ/G1. + - "renesas,rcar-gen2-cmt1" for 48-bit CMT1 devices included in R-Car Gen2 + and RZ/G1. + These are fallbacks for r8a73a4, R-Car Gen2 and RZ/G1 entries + listed above. - reg: base address and length of the registers block for the timer module. - interrupts: interrupt-specifier for the timer, one per channel. diff --git a/Documentation/devicetree/bindings/arm/spear-timer.txt b/Documentation/devicetree/bindings/timer/st,spear-timer.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/spear-timer.txt rename to Documentation/devicetree/bindings/timer/st,spear-timer.txt diff --git a/Documentation/devicetree/bindings/c6x/timer64.txt b/Documentation/devicetree/bindings/timer/ti,c64x+timer64.txt similarity index 100% rename from Documentation/devicetree/bindings/c6x/timer64.txt rename to Documentation/devicetree/bindings/timer/ti,c64x+timer64.txt diff --git a/Documentation/devicetree/bindings/arm/omap/timer.txt b/Documentation/devicetree/bindings/timer/ti,timer.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/omap/timer.txt rename to Documentation/devicetree/bindings/timer/ti,timer.txt diff --git a/Documentation/devicetree/bindings/arm/vt8500/via,vt8500-timer.txt b/Documentation/devicetree/bindings/timer/via,vt8500-timer.txt similarity index 100% rename from Documentation/devicetree/bindings/arm/vt8500/via,vt8500-timer.txt rename to Documentation/devicetree/bindings/timer/via,vt8500-timer.txt diff --git a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt index 0e03344e2e8bb3b9567f00dd5bfae3b2f1021bb3..2e9318151df79712bcfec96de8d9994e9415d66b 100644 --- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt +++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt @@ -76,6 +76,10 @@ Optional properties: needs to make sure it does not send more than 90% maximum_periodic_data_per_frame. The use case is multiple transactions, but less frame rate. +- mux-controls: The mux control for toggling host/device output of this + controller. It's expected that a mux state of 0 indicates device mode and a + mux state of 1 indicates host mode. +- mux-control-names: Shall be "usb_switch" if mux-controls is specified. i.mx specific properties - fsl,usbmisc: phandler of non-core register device, with one @@ -102,4 +106,6 @@ Example: rx-burst-size-dword = <0x10>; extcon = <0>, <&usb_id>; phy-clkgate-delay-us = <400>; + mux-controls = <&usb_switch>; + mux-control-names = "usb_switch"; }; diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt index 0dbd3083e7ddf25740ea35705258a3b7ef8dfc04..7f13ebef06cb57cecce50c1b10edea421a372398 100644 --- a/Documentation/devicetree/bindings/usb/dwc3.txt +++ b/Documentation/devicetree/bindings/usb/dwc3.txt @@ -7,6 +7,26 @@ Required properties: - compatible: must be "snps,dwc3" - reg : Address and length of the register set for the device - interrupts: Interrupts used by the dwc3 controller. + - clock-names: should contain "ref", "bus_early", "suspend" + - clocks: list of phandle and clock specifier pairs corresponding to + entries in the clock-names property. + +Exception for clocks: + clocks are optional if the parent node (i.e. glue-layer) is compatible to + one of the following: + "amlogic,meson-axg-dwc3" + "amlogic,meson-gxl-dwc3" + "cavium,octeon-7130-usb-uctl" + "qcom,dwc3" + "samsung,exynos5250-dwusb3" + "samsung,exynos7-dwusb3" + "sprd,sc9860-dwc3" + "st,stih407-dwc3" + "ti,am437x-dwc3" + "ti,dwc3" + "ti,keystone-dwc3" + "rockchip,rk3399-dwc3" + "xlnx,zynqmp-dwc3" Optional properties: - usb-phy : array of phandle for the PHY device. The first element @@ -15,6 +35,7 @@ Optional properties: - phys: from the *Generic PHY* bindings - phy-names: from the *Generic PHY* bindings; supported names are "usb2-phy" or "usb3-phy". + - resets: a single pair of phandle and reset specifier - snps,usb3_lpm_capable: determines if platform is USB3 LPM capable - snps,disable_scramble_quirk: true when SW should disable data scrambling. Only really useful for FPGA builds. diff --git a/Documentation/devicetree/bindings/usb/fcs,fusb302.txt b/Documentation/devicetree/bindings/usb/fcs,fusb302.txt index 472facfa5a719d3018e8425bb6779faeeb2c04f7..6087dc7f209e32d4298ce794b9461dd87f71a8df 100644 --- a/Documentation/devicetree/bindings/usb/fcs,fusb302.txt +++ b/Documentation/devicetree/bindings/usb/fcs,fusb302.txt @@ -6,12 +6,6 @@ Required properties : - interrupts : Interrupt specifier Optional properties : -- fcs,max-sink-microvolt : Maximum voltage to negotiate when configured as sink -- fcs,max-sink-microamp : Maximum current to negotiate when configured as sink -- fcs,max-sink-microwatt : Maximum power to negotiate when configured as sink - If this is less then max-sink-microvolt * - max-sink-microamp then the configured current will - be clamped. - fcs,operating-sink-microwatt : Minimum amount of power accepted from a sink when negotiating diff --git a/Documentation/devicetree/bindings/usb/hisilicon,histb-xhci.txt b/Documentation/devicetree/bindings/usb/hisilicon,histb-xhci.txt new file mode 100644 index 0000000000000000000000000000000000000000..f4633496b1221f21d61b9b87a13b281a1c153e5b --- /dev/null +++ b/Documentation/devicetree/bindings/usb/hisilicon,histb-xhci.txt @@ -0,0 +1,45 @@ +HiSilicon STB xHCI + +The device node for HiSilicon STB xHCI host controller + +Required properties: + - compatible: should be "hisilicon,hi3798cv200-xhci" + - reg: specifies physical base address and size of the registers + - interrupts : interrupt used by the controller + - clocks: a list of phandle + clock-specifier pairs, one for each + entry in clock-names + - clock-names: must contain + "bus": for bus clock + "utmi": for utmi clock + "pipe": for pipe clock + "suspend": for suspend clock + - resets: a list of phandle and reset specifier pairs as listed in + reset-names property. + - reset-names: must contain + "soft": for soft reset + - phys: a list of phandle + phy specifier pairs + - phy-names: must contain at least one of following: + "inno": for inno phy + "combo": for combo phy + +Optional properties: + - usb2-lpm-disable: indicate if we don't want to enable USB2 HW LPM + - usb3-lpm-capable: determines if platform is USB3 LPM capable + - imod-interval-ns: default interrupt moderation interval is 40000ns + +Example: + +xhci0: xchi@f98a0000 { + compatible = "hisilicon,hi3798cv200-xhci"; + reg = <0xf98a0000 0x10000>; + interrupts = ; + clocks = <&crg HISTB_USB3_BUS_CLK>, + <&crg HISTB_USB3_UTMI_CLK>, + <&crg HISTB_USB3_PIPE_CLK>, + <&crg HISTB_USB3_SUSPEND_CLK>; + clock-names = "bus", "utmi", "pipe", "suspend"; + resets = <&crg 0xb0 12>; + reset-names = "soft"; + phys = <&usb2_phy1_port1 0>, <&combphy0 PHY_TYPE_USB3>; + phy-names = "inno", "combo"; +}; diff --git a/Documentation/devicetree/bindings/usb/qcom,dwc3.txt b/Documentation/devicetree/bindings/usb/qcom,dwc3.txt index bc8a2fa5d2bfddaca308c7d9d6f9bcae7eb98b0f..95afdcf3c33725d21272a37eee480c0c7e3e6d13 100644 --- a/Documentation/devicetree/bindings/usb/qcom,dwc3.txt +++ b/Documentation/devicetree/bindings/usb/qcom,dwc3.txt @@ -1,54 +1,95 @@ Qualcomm SuperSpeed DWC3 USB SoC controller Required properties: -- compatible: should contain "qcom,dwc3" +- compatible: Compatible list, contains + "qcom,dwc3" + "qcom,msm8996-dwc3" for msm8996 SOC. + "qcom,sdm845-dwc3" for sdm845 SOC. +- reg: Offset and length of register set for QSCRATCH wrapper +- power-domains: specifies a phandle to PM domain provider node - clocks: A list of phandle + clock-specifier pairs for the clocks listed in clock-names -- clock-names: Should contain the following: +- clock-names: Should contain the following: "core" Master/Core clock, have to be >= 125 MHz for SS operation and >= 60MHz for HS operation + "mock_utmi" Mock utmi clock needed for ITP/SOF generation in + host mode. Its frequency should be 19.2MHz. + "sleep" Sleep clock, used for wakeup when USB3 core goes + into low power mode (U3). Optional clocks: - "iface" System bus AXI clock. Not present on all platforms - "sleep" Sleep clock, used when USB3 core goes into low - power mode (U3). + "iface" System bus AXI clock. + Not present on "qcom,msm8996-dwc3" compatible. + "cfg_noc" System Config NOC clock. + Not present on "qcom,msm8996-dwc3" compatible. +- assigned-clocks: Should be: + MOCK_UTMI_CLK + MASTER_CLK +- assigned-clock-rates: Should be: + 19.2Mhz (192000000) for MOCK_UTMI_CLK + >=125Mhz (125000000) for MASTER_CLK in SS mode + >=60Mhz (60000000) for MASTER_CLK in HS mode + +Optional properties: +- resets: Phandle to reset control that resets core and wrapper. +- interrupts: specifies interrupts from controller wrapper used + to wakeup from low power/susepnd state. Must contain + one or more entry for interrupt-names property +- interrupt-names: Must include the following entries: + - "hs_phy_irq": The interrupt that is asserted when a + wakeup event is received on USB2 bus + - "ss_phy_irq": The interrupt that is asserted when a + wakeup event is received on USB3 bus + - "dm_hs_phy_irq" and "dp_hs_phy_irq": Separate + interrupts for any wakeup event on DM and DP lines +- qcom,select-utmi-as-pipe-clk: if present, disable USB3 pipe_clk requirement. + Used when dwc3 operates without SSPHY and only + HS/FS/LS modes are supported. Required child node: A child node must exist to represent the core DWC3 IP block. The name of the node is not important. The content of the node is defined in dwc3.txt. Phy documentation is provided in the following places: -Documentation/devicetree/bindings/phy/qcom-dwc3-usb-phy.txt +Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt - USB3 QMP PHY +Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt - USB2 QUSB2 PHY Example device nodes: hs_phy: phy@100f8800 { - compatible = "qcom,dwc3-hs-usb-phy"; - reg = <0x100f8800 0x30>; - clocks = <&gcc USB30_0_UTMI_CLK>; - clock-names = "ref"; - #phy-cells = <0>; - + compatible = "qcom,qusb2-v2-phy"; + ... }; ss_phy: phy@100f8830 { - compatible = "qcom,dwc3-ss-usb-phy"; - reg = <0x100f8830 0x30>; - clocks = <&gcc USB30_0_MASTER_CLK>; - clock-names = "ref"; - #phy-cells = <0>; - + compatible = "qcom,qmp-v3-usb3-phy"; + ... }; - usb3_0: usb30@0 { + usb3_0: usb30@a6f8800 { compatible = "qcom,dwc3"; + reg = <0xa6f8800 0x400>; #address-cells = <1>; #size-cells = <1>; - clocks = <&gcc USB30_0_MASTER_CLK>; - clock-names = "core"; - ranges; + interrupts = <0 131 0>, <0 486 0>, <0 488 0>, <0 489 0>; + interrupt-names = "hs_phy_irq", "ss_phy_irq", + "dm_hs_phy_irq", "dp_hs_phy_irq"; + + clocks = <&gcc GCC_USB30_PRIM_MASTER_CLK>, + <&gcc GCC_USB30_PRIM_MOCK_UTMI_CLK>, + <&gcc GCC_USB30_PRIM_SLEEP_CLK>; + clock-names = "core", "mock_utmi", "sleep"; + + assigned-clocks = <&gcc GCC_USB30_PRIM_MOCK_UTMI_CLK>, + <&gcc GCC_USB30_PRIM_MASTER_CLK>; + assigned-clock-rates = <19200000>, <133000000>; + + resets = <&gcc GCC_USB30_PRIM_BCR>; + reset-names = "core_reset"; + power-domains = <&gcc USB30_PRIM_GDSC>; + qcom,select-utmi-as-pipe-clk; dwc3@10000000 { compatible = "snps,dwc3"; diff --git a/Documentation/devicetree/bindings/usb/richtek,rt1711h.txt b/Documentation/devicetree/bindings/usb/richtek,rt1711h.txt new file mode 100644 index 0000000000000000000000000000000000000000..09e847e92e5ef23ea24c6d0afe0f4bc002d86e85 --- /dev/null +++ b/Documentation/devicetree/bindings/usb/richtek,rt1711h.txt @@ -0,0 +1,17 @@ +Richtek RT1711H TypeC PD Controller. + +Required properties: + - compatible : Must be "richtek,rt1711h". + - reg : Must be 0x4e, it's slave address of RT1711H. + - interrupt-parent : the phandle for the interrupt controller that + provides interrupts for this device. + - interrupts : where a is the interrupt number and b represents an + encoding of the sense and level information for the interrupt. + +Example : +rt1711h@4e { + compatible = "richtek,rt1711h"; + reg = <0x4e>; + interrupt-parent = <&gpio26>; + interrupts = <0 IRQ_TYPE_LEVEL_LOW>; +}; diff --git a/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt index 50a31536e975ef915defb7fd35db914f3b0d6cf6..c8c4b00ecb941fe85144fb3efd8c5cfa4ec0e5e4 100644 --- a/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt +++ b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt @@ -16,7 +16,8 @@ A child node must exist to represent the core DWC3 IP block. The name of the node is not important. The content of the node is defined in dwc3.txt. Phy documentation is provided in the following places: -Documentation/devicetree/bindings/phy/rockchip,dwc3-usb-phy.txt +Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt - USB2.0 PHY +Documentation/devicetree/bindings/phy/phy-rockchip-typec.txt - Type-C PHY Example device nodes: diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index a38d8bfae19c3a7897032d1b80c83622ba92da5d..7cad066191eeb8e6c9711cb81fd50283362fcf4d 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -32,6 +32,7 @@ andestech Andes Technology Corporation apm Applied Micro Circuits Corporation (APM) aptina Aptina Imaging arasan Arasan Chip Systems +archermind ArcherMind Technology (Nanjing) Co., Ltd. arctic Arctic Sand aries Aries Embedded GmbH arm ARM Ltd. @@ -47,6 +48,7 @@ auvidea Auvidea GmbH avago Avago Technologies avia avia semiconductor avic Shanghai AVIC Optoelectronics Co., Ltd. +avnet Avnet, Inc. axentia Axentia Technologies AB axis Axis Communications AB bananapi BIPAI KEJI LIMITED @@ -56,6 +58,7 @@ bosch Bosch Sensortec GmbH boundary Boundary Devices Inc. brcm Broadcom Corporation buffalo Buffalo, Inc. +bticino Bticino International calxeda Calxeda capella Capella Microsystems, Inc cascoda Cascoda, Ltd. @@ -75,6 +78,7 @@ cnxt Conexant Systems, Inc. compulab CompuLab Ltd. cortina Cortina Systems, Inc. cosmic Cosmic Circuits +crane Crane Connectivity Solutions creative Creative Technology Ltd crystalfontz Crystalfontz America, Inc. cubietech Cubietech, Ltd. @@ -185,6 +189,7 @@ khadas Khadas kiebackpeter Kieback & Peter GmbH kinetic Kinetic Technologies kingnovel Kingnovel Technology Co., Ltd. +koe Kaohsiung Opto-Electronics Inc. kosagi Sutajio Ko-Usagi PTE Ltd. kyo Kyocera Corporation lacie LaCie @@ -199,11 +204,13 @@ linaro Linaro Limited linksys Belkin International, Inc. (Linksys) linux Linux-specific binding lltc Linear Technology Corporation +logicpd Logic PD, Inc. lsi LSI Corp. (LSI Logic) lwn Liebherr-Werk Nenzing GmbH macnica Macnica Americas marvell Marvell Technology Group Ltd. maxim Maxim Integrated Products +mbvl Mobiveil Inc. mcube mCube meas Measurement Specialties mediatek MediaTek Inc. @@ -279,6 +286,7 @@ pine64 Pine64 pixcir PIXCIR MICROELECTRONICS Co., Ltd plathome Plat'Home Co., Ltd. plda PLDA +portwell Portwell Inc. poslab Poslab Technology Co., Ltd. powervr PowerVR (deprecated, use img) probox2 PROBOX2 (by W2COMP Co., Ltd.) @@ -318,6 +326,7 @@ sgx SGX Sensortech sharp Sharp Corporation shimafuji Shimafuji Electric, Inc. si-en Si-En Technology Ltd. +sifive SiFive, Inc. sigma Sigma Designs, Inc. sii Seiko Instruments, Inc. sil Silicon Image @@ -393,6 +402,7 @@ vot Vision Optical Technology Co., Ltd. wd Western Digital Corp. wetek WeTek Electronics, limited. wexler Wexler +wi2wi Wi2Wi, Inc. winbond Winbond Electronics corp. winstar Winstar Display Corp. wlf Wolfson Microelectronics diff --git a/Documentation/devicetree/bindings/w1/w1-gpio.txt b/Documentation/devicetree/bindings/w1/w1-gpio.txt index 6e09c35d9f1a281a0046ed2c07dfce1f1312f48f..37091902a0210328e76582426eaec0eaa3a7ae3d 100644 --- a/Documentation/devicetree/bindings/w1/w1-gpio.txt +++ b/Documentation/devicetree/bindings/w1/w1-gpio.txt @@ -15,7 +15,7 @@ Optional properties: Examples: - onewire@0 { + onewire { compatible = "w1-gpio"; gpios = <&gpio 126 0>, <&gpio 105 0>; }; diff --git a/Documentation/devicetree/bindings/watchdog/ingenic,jz4740-wdt.txt b/Documentation/devicetree/bindings/watchdog/ingenic,jz4740-wdt.txt index cb44918f01a8b61b9ebf7444d372cf74c4a6a075..ce1cb72d53452df3ab226f3d68daaa5ae91663b2 100644 --- a/Documentation/devicetree/bindings/watchdog/ingenic,jz4740-wdt.txt +++ b/Documentation/devicetree/bindings/watchdog/ingenic,jz4740-wdt.txt @@ -3,10 +3,15 @@ Ingenic Watchdog Timer (WDT) Controller for JZ4740 & JZ4780 Required properties: compatible: "ingenic,jz4740-watchdog" or "ingenic,jz4780-watchdog" reg: Register address and length for watchdog registers +clocks: phandle to the RTC clock +clock-names: should be "rtc" Example: watchdog: jz4740-watchdog@10002000 { compatible = "ingenic,jz4740-watchdog"; - reg = <0x10002000 0x100>; + reg = <0x10002000 0x10>; + + clocks = <&cgu JZ4740_CLK_RTC>; + clock-names = "rtc"; }; diff --git a/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt b/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt index fa56697a1ba6e0f6b708c8a2af275d61d67117cc..f24d802b8056f6c6a726a758561ef471cbec4933 100644 --- a/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt +++ b/Documentation/devicetree/bindings/watchdog/renesas-wdt.txt @@ -1,19 +1,27 @@ Renesas Watchdog Timer (WDT) Controller Required properties: -- compatible : Should be "renesas,-wdt", and - "renesas,rcar-gen3-wdt" or "renesas,rza-wdt" as fallback. + - compatible : Must be "renesas,-wdt", followed by a generic + fallback compatible string when compatible with the generic + version. Examples with soctypes are: - - "renesas,r7s72100-wdt" (RZ/A1) + - "renesas,r8a7743-wdt" (RZ/G1M) + - "renesas,r8a7745-wdt" (RZ/G1E) + - "renesas,r8a7790-wdt" (R-Car H2) + - "renesas,r8a7791-wdt" (R-Car M2-W) + - "renesas,r8a7792-wdt" (R-Car V2H) + - "renesas,r8a7793-wdt" (R-Car M2-N) + - "renesas,r8a7794-wdt" (R-Car E2) - "renesas,r8a7795-wdt" (R-Car H3) - "renesas,r8a7796-wdt" (R-Car M3-W) - "renesas,r8a77965-wdt" (R-Car M3-N) - "renesas,r8a77970-wdt" (R-Car V3M) - "renesas,r8a77995-wdt" (R-Car D3) - - When compatible with the generic version, nodes must list the SoC-specific - version corresponding to the platform first, followed by the generic - version. + - "renesas,r7s72100-wdt" (RZ/A1) + The generic compatible string must be: + - "renesas,rza-wdt" for RZ/A + - "renesas,rcar-gen2-wdt" for R-Car Gen2 and RZ/G + - "renesas,rcar-gen3-wdt" for R-Car Gen3 - reg : Should contain WDT registers location and length - clocks : the clock feeding the watchdog timer. diff --git a/Documentation/driver-api/clk.rst b/Documentation/driver-api/clk.rst new file mode 100644 index 0000000000000000000000000000000000000000..593cca5058b144ae7c09d5c6f5def3fba2b07204 --- /dev/null +++ b/Documentation/driver-api/clk.rst @@ -0,0 +1,307 @@ +======================== +The Common Clk Framework +======================== + +:Author: Mike Turquette + +This document endeavours to explain the common clk framework details, +and how to port a platform over to this framework. It is not yet a +detailed explanation of the clock api in include/linux/clk.h, but +perhaps someday it will include that information. + +Introduction and interface split +================================ + +The common clk framework is an interface to control the clock nodes +available on various devices today. This may come in the form of clock +gating, rate adjustment, muxing or other operations. This framework is +enabled with the CONFIG_COMMON_CLK option. + +The interface itself is divided into two halves, each shielded from the +details of its counterpart. First is the common definition of struct +clk which unifies the framework-level accounting and infrastructure that +has traditionally been duplicated across a variety of platforms. Second +is a common implementation of the clk.h api, defined in +drivers/clk/clk.c. Finally there is struct clk_ops, whose operations +are invoked by the clk api implementation. + +The second half of the interface is comprised of the hardware-specific +callbacks registered with struct clk_ops and the corresponding +hardware-specific structures needed to model a particular clock. For +the remainder of this document any reference to a callback in struct +clk_ops, such as .enable or .set_rate, implies the hardware-specific +implementation of that code. Likewise, references to struct clk_foo +serve as a convenient shorthand for the implementation of the +hardware-specific bits for the hypothetical "foo" hardware. + +Tying the two halves of this interface together is struct clk_hw, which +is defined in struct clk_foo and pointed to within struct clk_core. This +allows for easy navigation between the two discrete halves of the common +clock interface. + +Common data structures and api +============================== + +Below is the common struct clk_core definition from +drivers/clk/clk.c, modified for brevity:: + + struct clk_core { + const char *name; + const struct clk_ops *ops; + struct clk_hw *hw; + struct module *owner; + struct clk_core *parent; + const char **parent_names; + struct clk_core **parents; + u8 num_parents; + u8 new_parent_index; + ... + }; + +The members above make up the core of the clk tree topology. The clk +api itself defines several driver-facing functions which operate on +struct clk. That api is documented in include/linux/clk.h. + +Platforms and devices utilizing the common struct clk_core use the struct +clk_ops pointer in struct clk_core to perform the hardware-specific parts of +the operations defined in clk-provider.h:: + + struct clk_ops { + int (*prepare)(struct clk_hw *hw); + void (*unprepare)(struct clk_hw *hw); + int (*is_prepared)(struct clk_hw *hw); + void (*unprepare_unused)(struct clk_hw *hw); + int (*enable)(struct clk_hw *hw); + void (*disable)(struct clk_hw *hw); + int (*is_enabled)(struct clk_hw *hw); + void (*disable_unused)(struct clk_hw *hw); + unsigned long (*recalc_rate)(struct clk_hw *hw, + unsigned long parent_rate); + long (*round_rate)(struct clk_hw *hw, + unsigned long rate, + unsigned long *parent_rate); + int (*determine_rate)(struct clk_hw *hw, + struct clk_rate_request *req); + int (*set_parent)(struct clk_hw *hw, u8 index); + u8 (*get_parent)(struct clk_hw *hw); + int (*set_rate)(struct clk_hw *hw, + unsigned long rate, + unsigned long parent_rate); + int (*set_rate_and_parent)(struct clk_hw *hw, + unsigned long rate, + unsigned long parent_rate, + u8 index); + unsigned long (*recalc_accuracy)(struct clk_hw *hw, + unsigned long parent_accuracy); + int (*get_phase)(struct clk_hw *hw); + int (*set_phase)(struct clk_hw *hw, int degrees); + void (*init)(struct clk_hw *hw); + void (*debug_init)(struct clk_hw *hw, + struct dentry *dentry); + }; + +Hardware clk implementations +============================ + +The strength of the common struct clk_core comes from its .ops and .hw pointers +which abstract the details of struct clk from the hardware-specific bits, and +vice versa. To illustrate consider the simple gateable clk implementation in +drivers/clk/clk-gate.c:: + + struct clk_gate { + struct clk_hw hw; + void __iomem *reg; + u8 bit_idx; + ... + }; + +struct clk_gate contains struct clk_hw hw as well as hardware-specific +knowledge about which register and bit controls this clk's gating. +Nothing about clock topology or accounting, such as enable_count or +notifier_count, is needed here. That is all handled by the common +framework code and struct clk_core. + +Let's walk through enabling this clk from driver code:: + + struct clk *clk; + clk = clk_get(NULL, "my_gateable_clk"); + + clk_prepare(clk); + clk_enable(clk); + +The call graph for clk_enable is very simple:: + + clk_enable(clk); + clk->ops->enable(clk->hw); + [resolves to...] + clk_gate_enable(hw); + [resolves struct clk gate with to_clk_gate(hw)] + clk_gate_set_bit(gate); + +And the definition of clk_gate_set_bit:: + + static void clk_gate_set_bit(struct clk_gate *gate) + { + u32 reg; + + reg = __raw_readl(gate->reg); + reg |= BIT(gate->bit_idx); + writel(reg, gate->reg); + } + +Note that to_clk_gate is defined as:: + + #define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw) + +This pattern of abstraction is used for every clock hardware +representation. + +Supporting your own clk hardware +================================ + +When implementing support for a new type of clock it is only necessary to +include the following header:: + + #include + +To construct a clk hardware structure for your platform you must define +the following:: + + struct clk_foo { + struct clk_hw hw; + ... hardware specific data goes here ... + }; + +To take advantage of your data you'll need to support valid operations +for your clk:: + + struct clk_ops clk_foo_ops { + .enable = &clk_foo_enable; + .disable = &clk_foo_disable; + }; + +Implement the above functions using container_of:: + + #define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw) + + int clk_foo_enable(struct clk_hw *hw) + { + struct clk_foo *foo; + + foo = to_clk_foo(hw); + + ... perform magic on foo ... + + return 0; + }; + +Below is a matrix detailing which clk_ops are mandatory based upon the +hardware capabilities of that clock. A cell marked as "y" means +mandatory, a cell marked as "n" implies that either including that +callback is invalid or otherwise unnecessary. Empty cells are either +optional or must be evaluated on a case-by-case basis. + +.. table:: clock hardware characteristics + + +----------------+------+-------------+---------------+-------------+------+ + | | gate | change rate | single parent | multiplexer | root | + +================+======+=============+===============+=============+======+ + |.prepare | | | | | | + +----------------+------+-------------+---------------+-------------+------+ + |.unprepare | | | | | | + +----------------+------+-------------+---------------+-------------+------+ + +----------------+------+-------------+---------------+-------------+------+ + |.enable | y | | | | | + +----------------+------+-------------+---------------+-------------+------+ + |.disable | y | | | | | + +----------------+------+-------------+---------------+-------------+------+ + |.is_enabled | y | | | | | + +----------------+------+-------------+---------------+-------------+------+ + +----------------+------+-------------+---------------+-------------+------+ + |.recalc_rate | | y | | | | + +----------------+------+-------------+---------------+-------------+------+ + |.round_rate | | y [1]_ | | | | + +----------------+------+-------------+---------------+-------------+------+ + |.determine_rate | | y [1]_ | | | | + +----------------+------+-------------+---------------+-------------+------+ + |.set_rate | | y | | | | + +----------------+------+-------------+---------------+-------------+------+ + +----------------+------+-------------+---------------+-------------+------+ + |.set_parent | | | n | y | n | + +----------------+------+-------------+---------------+-------------+------+ + |.get_parent | | | n | y | n | + +----------------+------+-------------+---------------+-------------+------+ + +----------------+------+-------------+---------------+-------------+------+ + |.recalc_accuracy| | | | | | + +----------------+------+-------------+---------------+-------------+------+ + +----------------+------+-------------+---------------+-------------+------+ + |.init | | | | | | + +----------------+------+-------------+---------------+-------------+------+ + +.. [1] either one of round_rate or determine_rate is required. + +Finally, register your clock at run-time with a hardware-specific +registration function. This function simply populates struct clk_foo's +data and then passes the common struct clk parameters to the framework +with a call to:: + + clk_register(...) + +See the basic clock types in ``drivers/clk/clk-*.c`` for examples. + +Disabling clock gating of unused clocks +======================================= + +Sometimes during development it can be useful to be able to bypass the +default disabling of unused clocks. For example, if drivers aren't enabling +clocks properly but rely on them being on from the bootloader, bypassing +the disabling means that the driver will remain functional while the issues +are sorted out. + +To bypass this disabling, include "clk_ignore_unused" in the bootargs to the +kernel. + +Locking +======= + +The common clock framework uses two global locks, the prepare lock and the +enable lock. + +The enable lock is a spinlock and is held across calls to the .enable, +.disable operations. Those operations are thus not allowed to sleep, +and calls to the clk_enable(), clk_disable() API functions are allowed in +atomic context. + +For clk_is_enabled() API, it is also designed to be allowed to be used in +atomic context. However, it doesn't really make any sense to hold the enable +lock in core, unless you want to do something else with the information of +the enable state with that lock held. Otherwise, seeing if a clk is enabled is +a one-shot read of the enabled state, which could just as easily change after +the function returns because the lock is released. Thus the user of this API +needs to handle synchronizing the read of the state with whatever they're +using it for to make sure that the enable state doesn't change during that +time. + +The prepare lock is a mutex and is held across calls to all other operations. +All those operations are allowed to sleep, and calls to the corresponding API +functions are not allowed in atomic context. + +This effectively divides operations in two groups from a locking perspective. + +Drivers don't need to manually protect resources shared between the operations +of one group, regardless of whether those resources are shared by multiple +clocks or not. However, access to resources that are shared between operations +of the two groups needs to be protected by the drivers. An example of such a +resource would be a register that controls both the clock rate and the clock +enable/disable state. + +The clock framework is reentrant, in that a driver is allowed to call clock +framework functions from within its implementation of clock operations. This +can for instance cause a .set_rate operation of one clock being called from +within the .set_rate operation of another clock. This case must be considered +in the driver implementations, but the code flow is usually controlled by the +driver in that case. + +Note that locking must also be considered when code outside of the common +clock framework needs to access resources used by the clock operations. This +is considered out of scope of this document. diff --git a/Documentation/driver-api/device_connection.rst b/Documentation/driver-api/device_connection.rst index affbc5566ab0373eb7ef008827c945fd8a972160..ba364224c349b610cc0eeee87d25635e82bd55b5 100644 --- a/Documentation/driver-api/device_connection.rst +++ b/Documentation/driver-api/device_connection.rst @@ -40,4 +40,4 @@ API --- .. kernel-doc:: drivers/base/devcon.c - : functions: device_connection_find_match device_connection_find device_connection_add device_connection_remove + :functions: device_connection_find_match device_connection_find device_connection_add device_connection_remove diff --git a/Documentation/driver-api/firmware/fallback-mechanisms.rst b/Documentation/driver-api/firmware/fallback-mechanisms.rst index f353783ae0be5d2823c285538eeb7a898b4f0408..d35fed65eae977aa07d2229a441c4a4eed2eaedb 100644 --- a/Documentation/driver-api/firmware/fallback-mechanisms.rst +++ b/Documentation/driver-api/firmware/fallback-mechanisms.rst @@ -72,9 +72,12 @@ the firmware requested, and establishes it in the device hierarchy by associating the device used to make the request as the device's parent. The sysfs directory's file attributes are defined and controlled through the new device's class (firmware_class) and group (fw_dev_attr_groups). -This is actually where the original firmware_class.c file name comes from, -as originally the only firmware loading mechanism available was the -mechanism we now use as a fallback mechanism. +This is actually where the original firmware_class module name came from, +given that originally the only firmware loading mechanism available was the +mechanism we now use as a fallback mechanism, which registers a struct class +firmware_class. Because the attributes exposed are part of the module name, the +module name firmware_class cannot be renamed in the future, to ensure backward +compatibility with old userspace. To load firmware using the sysfs interface we expose a loading indicator, and a file upload firmware into: @@ -83,7 +86,7 @@ and a file upload firmware into: * /sys/$DEVPATH/data To upload firmware you will echo 1 onto the loading file to indicate -you are loading firmware. You then cat the firmware into the data file, +you are loading firmware. You then write the firmware into the data file, and you notify the kernel the firmware is ready by echo'ing 0 onto the loading file. @@ -136,7 +139,8 @@ by kobject uevents. This is specially exacerbated due to the fact that most distributions today disable CONFIG_FW_LOADER_USER_HELPER_FALLBACK. Refer to do_firmware_uevent() for details of the kobject event variables -setup. Variables passwdd with a kobject add event: +setup. The variables currently passed to userspace with a "kobject add" +event are: * FIRMWARE=firmware name * TIMEOUT=timeout value diff --git a/Documentation/driver-api/firmware/firmware_cache.rst b/Documentation/driver-api/firmware/firmware_cache.rst index 2210e5bfb332e189dd9a2ba9169370017cde9bb3..c2e69d9c6bf13c22a20c80dc9645ba03b73b0c62 100644 --- a/Documentation/driver-api/firmware/firmware_cache.rst +++ b/Documentation/driver-api/firmware/firmware_cache.rst @@ -29,8 +29,8 @@ Some implementation details about the firmware cache setup: * If an asynchronous call is used the firmware cache is only set up for a device if if the second argument (uevent) to request_firmware_nowait() is true. When uevent is true it requests that a kobject uevent be sent to - userspace for the firmware request. For details refer to the Fackback - mechanism documented below. + userspace for the firmware request through the sysfs fallback mechanism + if the firmware file is not found. * If the firmware cache is determined to be needed as per the above two criteria the firmware cache is setup by adding a devres entry for the diff --git a/Documentation/driver-api/firmware/request_firmware.rst b/Documentation/driver-api/firmware/request_firmware.rst index d5ec95a7195bf001f6fa855dc6d7fa922ba40ed7..f62bdcbfed5b58b6ef2ffd966219a950c22403c4 100644 --- a/Documentation/driver-api/firmware/request_firmware.rst +++ b/Documentation/driver-api/firmware/request_firmware.rst @@ -20,6 +20,11 @@ request_firmware .. kernel-doc:: drivers/base/firmware_loader/main.c :functions: request_firmware +firmware_request_nowarn +----------------------- +.. kernel-doc:: drivers/base/firmware_loader/main.c + :functions: firmware_request_nowarn + request_firmware_direct ----------------------- .. kernel-doc:: drivers/base/firmware_loader/main.c diff --git a/Documentation/driver-api/fpga/fpga-bridge.rst b/Documentation/driver-api/fpga/fpga-bridge.rst new file mode 100644 index 0000000000000000000000000000000000000000..2c2aaca894bf4aefdfe67500338aa1253533cb14 --- /dev/null +++ b/Documentation/driver-api/fpga/fpga-bridge.rst @@ -0,0 +1,49 @@ +FPGA Bridge +=========== + +API to implement a new FPGA bridge +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: include/linux/fpga/fpga-bridge.h + :functions: fpga_bridge + +.. kernel-doc:: include/linux/fpga/fpga-bridge.h + :functions: fpga_bridge_ops + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_create + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_free + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_register + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_unregister + +API to control an FPGA bridge +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You probably won't need these directly. FPGA regions should handle this. + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: of_fpga_bridge_get + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_get + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_put + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_get_to_list + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: of_fpga_bridge_get_to_list + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_enable + +.. kernel-doc:: drivers/fpga/fpga-bridge.c + :functions: fpga_bridge_disable diff --git a/Documentation/driver-api/fpga/fpga-mgr.rst b/Documentation/driver-api/fpga/fpga-mgr.rst new file mode 100644 index 0000000000000000000000000000000000000000..bcf2dd24e17917f4f9b5730aee2ebd3ba0cab41d --- /dev/null +++ b/Documentation/driver-api/fpga/fpga-mgr.rst @@ -0,0 +1,220 @@ +FPGA Manager +============ + +Overview +-------- + +The FPGA manager core exports a set of functions for programming an FPGA with +an image. The API is manufacturer agnostic. All manufacturer specifics are +hidden away in a low level driver which registers a set of ops with the core. +The FPGA image data itself is very manufacturer specific, but for our purposes +it's just binary data. The FPGA manager core won't parse it. + +The FPGA image to be programmed can be in a scatter gather list, a single +contiguous buffer, or a firmware file. Because allocating contiguous kernel +memory for the buffer should be avoided, users are encouraged to use a scatter +gather list instead if possible. + +The particulars for programming the image are presented in a structure (struct +fpga_image_info). This struct contains parameters such as pointers to the +FPGA image as well as image-specific particulars such as whether the image was +built for full or partial reconfiguration. + +How to support a new FPGA device +-------------------------------- + +To add another FPGA manager, write a driver that implements a set of ops. The +probe function calls fpga_mgr_register(), such as:: + + static const struct fpga_manager_ops socfpga_fpga_ops = { + .write_init = socfpga_fpga_ops_configure_init, + .write = socfpga_fpga_ops_configure_write, + .write_complete = socfpga_fpga_ops_configure_complete, + .state = socfpga_fpga_ops_state, + }; + + static int socfpga_fpga_probe(struct platform_device *pdev) + { + struct device *dev = &pdev->dev; + struct socfpga_fpga_priv *priv; + struct fpga_manager *mgr; + int ret; + + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + /* + * do ioremaps, get interrupts, etc. and save + * them in priv + */ + + mgr = fpga_mgr_create(dev, "Altera SOCFPGA FPGA Manager", + &socfpga_fpga_ops, priv); + if (!mgr) + return -ENOMEM; + + platform_set_drvdata(pdev, mgr); + + ret = fpga_mgr_register(mgr); + if (ret) + fpga_mgr_free(mgr); + + return ret; + } + + static int socfpga_fpga_remove(struct platform_device *pdev) + { + struct fpga_manager *mgr = platform_get_drvdata(pdev); + + fpga_mgr_unregister(mgr); + + return 0; + } + + +The ops will implement whatever device specific register writes are needed to +do the programming sequence for this particular FPGA. These ops return 0 for +success or negative error codes otherwise. + +The programming sequence is:: + 1. .write_init + 2. .write or .write_sg (may be called once or multiple times) + 3. .write_complete + +The .write_init function will prepare the FPGA to receive the image data. The +buffer passed into .write_init will be atmost .initial_header_size bytes long, +if the whole bitstream is not immediately available then the core code will +buffer up at least this much before starting. + +The .write function writes a buffer to the FPGA. The buffer may be contain the +whole FPGA image or may be a smaller chunk of an FPGA image. In the latter +case, this function is called multiple times for successive chunks. This interface +is suitable for drivers which use PIO. + +The .write_sg version behaves the same as .write except the input is a sg_table +scatter list. This interface is suitable for drivers which use DMA. + +The .write_complete function is called after all the image has been written +to put the FPGA into operating mode. + +The ops include a .state function which will read the hardware FPGA manager and +return a code of type enum fpga_mgr_states. It doesn't result in a change in +hardware state. + +How to write an image buffer to a supported FPGA +------------------------------------------------ + +Some sample code:: + + #include + + struct fpga_manager *mgr; + struct fpga_image_info *info; + int ret; + + /* + * Get a reference to FPGA manager. The manager is not locked, so you can + * hold onto this reference without it preventing programming. + * + * This example uses the device node of the manager. Alternatively, use + * fpga_mgr_get(dev) instead if you have the device. + */ + mgr = of_fpga_mgr_get(mgr_node); + + /* struct with information about the FPGA image to program. */ + info = fpga_image_info_alloc(dev); + + /* flags indicates whether to do full or partial reconfiguration */ + info->flags = FPGA_MGR_PARTIAL_RECONFIG; + + /* + * At this point, indicate where the image is. This is pseudo-code; you're + * going to use one of these three. + */ + if (image is in a scatter gather table) { + + info->sgt = [your scatter gather table] + + } else if (image is in a buffer) { + + info->buf = [your image buffer] + info->count = [image buffer size] + + } else if (image is in a firmware file) { + + info->firmware_name = devm_kstrdup(dev, firmware_name, GFP_KERNEL); + + } + + /* Get exclusive control of FPGA manager */ + ret = fpga_mgr_lock(mgr); + + /* Load the buffer to the FPGA */ + ret = fpga_mgr_buf_load(mgr, &info, buf, count); + + /* Release the FPGA manager */ + fpga_mgr_unlock(mgr); + fpga_mgr_put(mgr); + + /* Deallocate the image info if you're done with it */ + fpga_image_info_free(info); + +API for implementing a new FPGA Manager driver +---------------------------------------------- + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :functions: fpga_manager + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :functions: fpga_manager_ops + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_create + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_free + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_register + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_unregister + +API for programming a FPGA +-------------------------- + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :functions: fpga_image_info + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :functions: fpga_mgr_states + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_image_info_alloc + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_image_info_free + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: of_fpga_mgr_get + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_get + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_put + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_lock + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_unlock + +.. kernel-doc:: include/linux/fpga/fpga-mgr.h + :functions: fpga_mgr_states + +Note - use :c:func:`fpga_region_program_fpga()` instead of :c:func:`fpga_mgr_load()` + +.. kernel-doc:: drivers/fpga/fpga-mgr.c + :functions: fpga_mgr_load diff --git a/Documentation/driver-api/fpga/fpga-region.rst b/Documentation/driver-api/fpga/fpga-region.rst new file mode 100644 index 0000000000000000000000000000000000000000..f89e4a3117224079bef5a6fdb40485c7939f7058 --- /dev/null +++ b/Documentation/driver-api/fpga/fpga-region.rst @@ -0,0 +1,102 @@ +FPGA Region +=========== + +Overview +-------- + +This document is meant to be an brief overview of the FPGA region API usage. A +more conceptual look at regions can be found in the Device Tree binding +document [#f1]_. + +For the purposes of this API document, let's just say that a region associates +an FPGA Manager and a bridge (or bridges) with a reprogrammable region of an +FPGA or the whole FPGA. The API provides a way to register a region and to +program a region. + +Currently the only layer above fpga-region.c in the kernel is the Device Tree +support (of-fpga-region.c) described in [#f1]_. The DT support layer uses regions +to program the FPGA and then DT to handle enumeration. The common region code +is intended to be used by other schemes that have other ways of accomplishing +enumeration after programming. + +An fpga-region can be set up to know the following things: + + * which FPGA manager to use to do the programming + + * which bridges to disable before programming and enable afterwards. + +Additional info needed to program the FPGA image is passed in the struct +fpga_image_info including: + + * pointers to the image as either a scatter-gather buffer, a contiguous + buffer, or the name of firmware file + + * flags indicating specifics such as whether the image if for partial + reconfiguration. + +How to program a FPGA using a region +------------------------------------ + +First, allocate the info struct:: + + info = fpga_image_info_alloc(dev); + if (!info) + return -ENOMEM; + +Set flags as needed, i.e.:: + + info->flags |= FPGA_MGR_PARTIAL_RECONFIG; + +Point to your FPGA image, such as:: + + info->sgt = &sgt; + +Add info to region and do the programming:: + + region->info = info; + ret = fpga_region_program_fpga(region); + +:c:func:`fpga_region_program_fpga()` operates on info passed in the +fpga_image_info (region->info). This function will attempt to: + + * lock the region's mutex + * lock the region's FPGA manager + * build a list of FPGA bridges if a method has been specified to do so + * disable the bridges + * program the FPGA + * re-enable the bridges + * release the locks + +Then you will want to enumerate whatever hardware has appeared in the FPGA. + +How to add a new FPGA region +---------------------------- + +An example of usage can be seen in the probe function of [#f2]_. + +.. [#f1] ../devicetree/bindings/fpga/fpga-region.txt +.. [#f2] ../../drivers/fpga/of-fpga-region.c + +API to program a FGPA +--------------------- + +.. kernel-doc:: drivers/fpga/fpga-region.c + :functions: fpga_region_program_fpga + +API to add a new FPGA region +---------------------------- + +.. kernel-doc:: include/linux/fpga/fpga-region.h + :functions: fpga_region + +.. kernel-doc:: drivers/fpga/fpga-region.c + :functions: fpga_region_create + +.. kernel-doc:: drivers/fpga/fpga-region.c + :functions: fpga_region_free + +.. kernel-doc:: drivers/fpga/fpga-region.c + :functions: fpga_region_register + +.. kernel-doc:: drivers/fpga/fpga-region.c + :functions: fpga_region_unregister diff --git a/Documentation/driver-api/fpga/index.rst b/Documentation/driver-api/fpga/index.rst new file mode 100644 index 0000000000000000000000000000000000000000..c51e5ebd544abc7a4a45b3bd49887a630687c227 --- /dev/null +++ b/Documentation/driver-api/fpga/index.rst @@ -0,0 +1,13 @@ +============== +FPGA Subsystem +============== + +:Author: Alan Tull + +.. toctree:: + :maxdepth: 2 + + intro + fpga-mgr + fpga-bridge + fpga-region diff --git a/Documentation/driver-api/fpga/intro.rst b/Documentation/driver-api/fpga/intro.rst new file mode 100644 index 0000000000000000000000000000000000000000..51cd81dbb4dc5ccf5846ee0380f62d65611edc39 --- /dev/null +++ b/Documentation/driver-api/fpga/intro.rst @@ -0,0 +1,54 @@ +Introduction +============ + +The FPGA subsystem supports reprogramming FPGAs dynamically under +Linux. Some of the core intentions of the FPGA subsystems are: + +* The FPGA subsystem is vendor agnostic. + +* The FPGA subsystem separates upper layers (userspace interfaces and + enumeration) from lower layers that know how to program a specific + FPGA. + +* Code should not be shared between upper and lower layers. This + should go without saying. If that seems necessary, there's probably + framework functionality that that can be added that will benefit + other users. Write the linux-fpga mailing list and maintainers and + seek out a solution that expands the framework for broad reuse. + +* Generally, when adding code, think of the future. Plan for re-use. + +The framework in the kernel is divided into: + +FPGA Manager +------------ + +If you are adding a new FPGA or a new method of programming a FPGA, +this is the subsystem for you. Low level FPGA manager drivers contain +the knowledge of how to program a specific device. This subsystem +includes the framework in fpga-mgr.c and the low level drivers that +are registered with it. + +FPGA Bridge +----------- + +FPGA Bridges prevent spurious signals from going out of a FPGA or a +region of a FPGA during programming. They are disabled before +programming begins and re-enabled afterwards. An FPGA bridge may be +actual hard hardware that gates a bus to a cpu or a soft ("freeze") +bridge in FPGA fabric that surrounds a partial reconfiguration region +of an FPGA. This subsystem includes fpga-bridge.c and the low level +drivers that are registered with it. + +FPGA Region +----------- + +If you are adding a new interface to the FPGA framework, add it on top +of a FPGA region to allow the most reuse of your interface. + +The FPGA Region framework (fpga-region.c) associates managers and +bridges as reconfigurable regions. A region may refer to the whole +FPGA in full reconfiguration or to a partial reconfiguration region. + +The Device Tree FPGA Region support (of-fpga-region.c) handles +reprogramming FPGAs when device tree overlays are applied. diff --git a/Documentation/driver-api/gpio/board.rst b/Documentation/driver-api/gpio/board.rst index 25d62b2e9fd0b26968bbadcebf780d7d70c0a6a8..2c112553df841444730beee24d30485857d99145 100644 --- a/Documentation/driver-api/gpio/board.rst +++ b/Documentation/driver-api/gpio/board.rst @@ -177,3 +177,19 @@ mapping and is thus transparent to GPIO consumers. A set of functions such as gpiod_set_value() is available to work with the new descriptor-oriented interface. + +Boards using platform data can also hog GPIO lines by defining GPIO hog tables. + +.. code-block:: c + + struct gpiod_hog gpio_hog_table[] = { + GPIO_HOG("gpio.0", 10, "foo", GPIO_ACTIVE_LOW, GPIOD_OUT_HIGH), + { } + }; + +And the table can be added to the board code as follows:: + + gpiod_add_hogs(gpio_hog_table); + +The line will be hogged as soon as the gpiochip is created or - in case the +chip was created earlier - when the hog table is registered. diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst index c71a50d85b50170dcfbb0323f24c29a0494d1648..aa03f389d41d6a794f16607ee076f248848da17a 100644 --- a/Documentation/driver-api/gpio/consumer.rst +++ b/Documentation/driver-api/gpio/consumer.rst @@ -57,7 +57,7 @@ device that displays digits), an additional index argument can be specified:: enum gpiod_flags flags) For a more detailed description of the con_id parameter in the DeviceTree case -see Documentation/gpio/board.txt +see Documentation/driver-api/gpio/board.rst The flags parameter is used to optionally specify a direction and initial value for the GPIO. Values can be: diff --git a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst index 505ee906d7d9590d68484890cebf753a40ec22be..cbe0242842d1540f833bae20b304374303bc6b09 100644 --- a/Documentation/driver-api/gpio/driver.rst +++ b/Documentation/driver-api/gpio/driver.rst @@ -44,7 +44,7 @@ common to each controller of that type: - methods to establish GPIO line direction - methods used to access GPIO line values - - method to set electrical configuration to a a given GPIO line + - method to set electrical configuration for a given GPIO line - method to return the IRQ number associated to a given GPIO line - flag saying whether calls to its methods may sleep - optional line names array to identify lines @@ -143,7 +143,7 @@ resistor will make the line tend to high level unless one of the transistors on the rail actively pulls it down. The level on the line will go as high as the VDD on the pull-up resistor, which -may be higher than the level supported by the transistor, achieveing a +may be higher than the level supported by the transistor, achieving a level-shift to the higher VDD. Integrated electronics often have an output driver stage in the form of a CMOS @@ -382,7 +382,7 @@ Real-Time compliance for GPIO IRQ chips Any provider of irqchips needs to be carefully tailored to support Real Time preemption. It is desirable that all irqchips in the GPIO subsystem keep this -in mind and does the proper testing to assure they are real time-enabled. +in mind and do the proper testing to assure they are real time-enabled. So, pay attention on above " RT_FULL:" notes, please. The following is a checklist to follow when preparing a driver for real time-compliance: diff --git a/Documentation/driver-api/gpio/drivers-on-gpio.rst b/Documentation/driver-api/gpio/drivers-on-gpio.rst index 7da0c1dd1f7a4936c1787022fbf4dfcd011fa584..f3a189320e11fc28705292db44150b44b224658d 100644 --- a/Documentation/driver-api/gpio/drivers-on-gpio.rst +++ b/Documentation/driver-api/gpio/drivers-on-gpio.rst @@ -85,6 +85,10 @@ hardware descriptions such as device tree or ACPI: any other serio bus to the system and makes it possible to connect drivers for e.g. keyboards and other PS/2 protocol based devices. +- cec-gpio: drivers/media/platform/cec-gpio/ is used to interact with a CEC + Consumer Electronics Control bus using only GPIO. It is used to communicate + with devices on the HDMI bus. + Apart from this there are special GPIO drivers in subsystems like MMC/SD to read card detect and write protect GPIO lines, and in the TTY serial subsystem to emulate MCTRL (modem control) signals CTS/RTS by using two GPIO lines. The diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 6d8352c0f3547a9f63fa7db614e1466f2b822616..6d9f2f9fe20ee65623833326c98b79d5ded44fbd 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -17,7 +17,9 @@ available subsections can be seen below. basics infrastructure pm/index + clk device-io + device_connection dma-buf device_link message-based @@ -34,6 +36,7 @@ available subsections can be seen below. edac scsi libata + target mtdnand miscellaneous w1 @@ -49,6 +52,7 @@ available subsections can be seen below. dmaengine/index slimbus soundwire/index + fpga/index .. only:: subproject and html diff --git a/Documentation/driver-api/infrastructure.rst b/Documentation/driver-api/infrastructure.rst index bee1b9a1702f1cc6c89811aff6b8bdbc1eefb0b0..6172f3cc3d0b2109916cfccda2da1836065f8766 100644 --- a/Documentation/driver-api/infrastructure.rst +++ b/Documentation/driver-api/infrastructure.rst @@ -49,10 +49,10 @@ Device Drivers Base Device Drivers DMA Management ----------------------------- -.. kernel-doc:: drivers/base/dma-coherent.c +.. kernel-doc:: kernel/dma/coherent.c :export: -.. kernel-doc:: drivers/base/dma-mapping.c +.. kernel-doc:: kernel/dma/mapping.c :export: Device drivers PnP support diff --git a/Documentation/driver-api/scsi.rst b/Documentation/driver-api/scsi.rst index 31ad0fed6763bcdac1df622ae8b66433c6e2a5a5..64b231d125e0fc0f5396e18022898086b2d5be40 100644 --- a/Documentation/driver-api/scsi.rst +++ b/Documentation/driver-api/scsi.rst @@ -334,5 +334,5 @@ todo ~~~~ Parallel (fast/wide/ultra) SCSI, USB, SATA, SAS, Fibre Channel, -FireWire, ATAPI devices, Infiniband, I2O, iSCSI, Parallel ports, +FireWire, ATAPI devices, Infiniband, I2O, Parallel ports, netlink... diff --git a/Documentation/driver-api/soundwire/error_handling.rst b/Documentation/driver-api/soundwire/error_handling.rst new file mode 100644 index 0000000000000000000000000000000000000000..aa3a0a23a0668c8e09694bbb4a6ee99b2721471a --- /dev/null +++ b/Documentation/driver-api/soundwire/error_handling.rst @@ -0,0 +1,65 @@ +======================== +SoundWire Error Handling +======================== + +The SoundWire PHY was designed with care and errors on the bus are going to +be very unlikely, and if they happen it should be limited to single bit +errors. Examples of this design can be found in the synchronization +mechanism (sync loss after two errors) and short CRCs used for the Bulk +Register Access. + +The errors can be detected with multiple mechanisms: + +1. Bus clash or parity errors: This mechanism relies on low-level detectors + that are independent of the payload and usages, and they cover both control + and audio data. The current implementation only logs such errors. + Improvements could be invalidating an entire programming sequence and + restarting from a known position. In the case of such errors outside of a + control/command sequence, there is no concealment or recovery for audio + data enabled by the SoundWire protocol, the location of the error will also + impact its audibility (most-significant bits will be more impacted in PCM), + and after a number of such errors are detected the bus might be reset. Note + that bus clashes due to programming errors (two streams using the same bit + slots) or electrical issues during the transmit/receive transition cannot + be distinguished, although a recurring bus clash when audio is enabled is a + indication of a bus allocation issue. The interrupt mechanism can also help + identify Slaves which detected a Bus Clash or a Parity Error, but they may + not be responsible for the errors so resetting them individually is not a + viable recovery strategy. + +2. Command status: Each command is associated with a status, which only + covers transmission of the data between devices. The ACK status indicates + that the command was received and will be executed by the end of the + current frame. A NAK indicates that the command was in error and will not + be applied. In case of a bad programming (command sent to non-existent + Slave or to a non-implemented register) or electrical issue, no response + signals the command was ignored. Some Master implementations allow for a + command to be retransmitted several times. If the retransmission fails, + backtracking and restarting the entire programming sequence might be a + solution. Alternatively some implementations might directly issue a bus + reset and re-enumerate all devices. + +3. Timeouts: In a number of cases such as ChannelPrepare or + ClockStopPrepare, the bus driver is supposed to poll a register field until + it transitions to a NotFinished value of zero. The MIPI SoundWire spec 1.1 + does not define timeouts but the MIPI SoundWire DisCo document adds + recommendation on timeouts. If such configurations do not complete, the + driver will return a -ETIMEOUT. Such timeouts are symptoms of a faulty + Slave device and are likely impossible to recover from. + +Errors during global reconfiguration sequences are extremely difficult to +handle: + +1. BankSwitch: An error during the last command issuing a BankSwitch is + difficult to backtrack from. Retransmitting the Bank Switch command may be + possible in a single segment setup, but this can lead to synchronization + problems when enabling multiple bus segments (a command with side effects + such as frame reconfiguration would be handled at different times). A global + hard-reset might be the best solution. + +Note that SoundWire does not provide a mechanism to detect illegal values +written in valid registers. In a number of cases the standard even mentions +that the Slave might behave in implementation-defined ways. The bus +implementation does not provide a recovery mechanism for such errors, Slave +or Master driver implementers are responsible for writing valid values in +valid registers and implement additional range checking if needed. diff --git a/Documentation/driver-api/soundwire/index.rst b/Documentation/driver-api/soundwire/index.rst index 647e946547524b763911b6e833b89a8886a8c820..6db026028f273b27dfe2a78253e78eaa4180c8b6 100644 --- a/Documentation/driver-api/soundwire/index.rst +++ b/Documentation/driver-api/soundwire/index.rst @@ -6,6 +6,9 @@ SoundWire Documentation :maxdepth: 1 summary + stream + error_handling + locking .. only:: subproject diff --git a/Documentation/driver-api/soundwire/locking.rst b/Documentation/driver-api/soundwire/locking.rst new file mode 100644 index 0000000000000000000000000000000000000000..253f73555255091b5abceafce9c103ca6a3cd771 --- /dev/null +++ b/Documentation/driver-api/soundwire/locking.rst @@ -0,0 +1,106 @@ +================= +SoundWire Locking +================= + +This document explains locking mechanism of the SoundWire Bus. Bus uses +following locks in order to avoid race conditions in Bus operations on +shared resources. + + - Bus lock + + - Message lock + +Bus lock +======== + +SoundWire Bus lock is a mutex and is part of Bus data structure +(sdw_bus) which is used for every Bus instance. This lock is used to +serialize each of the following operations(s) within SoundWire Bus instance. + + - Addition and removal of Slave(s), changing Slave status. + + - Prepare, Enable, Disable and De-prepare stream operations. + + - Access of Stream data structure. + +Message lock +============ + +SoundWire message transfer lock. This mutex is part of +Bus data structure (sdw_bus). This lock is used to serialize the message +transfers (read/write) within a SoundWire Bus instance. + +Below examples show how locks are acquired. + +Example 1 +--------- + +Message transfer. + + 1. For every message transfer + + a. Acquire Message lock. + + b. Transfer message (Read/Write) to Slave1 or broadcast message on + Bus in case of bank switch. + + c. Release Message lock :: + + +----------+ +---------+ + | | | | + | Bus | | Master | + | | | Driver | + | | | | + +----+-----+ +----+----+ + | | + | bus->ops->xfer_msg() | + <-------------------------------+ a. Acquire Message lock + | | b. Transfer message + | | + +-------------------------------> c. Release Message lock + | return success/error | d. Return success/error + | | + + + + +Example 2 +--------- + +Prepare operation. + + 1. Acquire lock for Bus instance associated with Master 1. + + 2. For every message transfer in Prepare operation + + a. Acquire Message lock. + + b. Transfer message (Read/Write) to Slave1 or broadcast message on + Bus in case of bank switch. + + c. Release Message lock. + + 3. Release lock for Bus instance associated with Master 1 :: + + +----------+ +---------+ + | | | | + | Bus | | Master | + | | | Driver | + | | | | + +----+-----+ +----+----+ + | | + | sdw_prepare_stream() | + <-------------------------------+ 1. Acquire bus lock + | | 2. Perform stream prepare + | | + | | + | bus->ops->xfer_msg() | + <-------------------------------+ a. Acquire Message lock + | | b. Transfer message + | | + +-------------------------------> c. Release Message lock + | return success/error | d. Return success/error + | | + | | + | return success/error | 3. Release bus lock + +-------------------------------> 4. Return success/error + | | + + + diff --git a/Documentation/driver-api/soundwire/stream.rst b/Documentation/driver-api/soundwire/stream.rst new file mode 100644 index 0000000000000000000000000000000000000000..29121aa55fb96a3e431859797ccc699eb8891f78 --- /dev/null +++ b/Documentation/driver-api/soundwire/stream.rst @@ -0,0 +1,372 @@ +========================= +Audio Stream in SoundWire +========================= + +An audio stream is a logical or virtual connection created between + + (1) System memory buffer(s) and Codec(s) + + (2) DSP memory buffer(s) and Codec(s) + + (3) FIFO(s) and Codec(s) + + (4) Codec(s) and Codec(s) + +which is typically driven by a DMA(s) channel through the data link. An +audio stream contains one or more channels of data. All channels within +stream must have same sample rate and same sample size. + +Assume a stream with two channels (Left & Right) is opened using SoundWire +interface. Below are some ways a stream can be represented in SoundWire. + +Stream Sample in memory (System memory, DSP memory or FIFOs) :: + + ------------------------- + | L | R | L | R | L | R | + ------------------------- + +Example 1: Stereo Stream with L and R channels is rendered from Master to +Slave. Both Master and Slave is using single port. :: + + +---------------+ Clock Signal +---------------+ + | Master +----------------------------------+ Slave | + | Interface | | Interface | + | | | 1 | + | | Data Signal | | + | L + R +----------------------------------+ L + R | + | (Data) | Data Direction | (Data) | + +---------------+ +-----------------------> +---------------+ + + +Example 2: Stereo Stream with L and R channels is captured from Slave to +Master. Both Master and Slave is using single port. :: + + + +---------------+ Clock Signal +---------------+ + | Master +----------------------------------+ Slave | + | Interface | | Interface | + | | | 1 | + | | Data Signal | | + | L + R +----------------------------------+ L + R | + | (Data) | Data Direction | (Data) | + +---------------+ <-----------------------+ +---------------+ + + +Example 3: Stereo Stream with L and R channels is rendered by Master. Each +of the L and R channel is received by two different Slaves. Master and both +Slaves are using single port. :: + + +---------------+ Clock Signal +---------------+ + | Master +---------+------------------------+ Slave | + | Interface | | | Interface | + | | | | 1 | + | | | Data Signal | | + | L + R +---+------------------------------+ L | + | (Data) | | | Data Direction | (Data) | + +---------------+ | | +-------------> +---------------+ + | | + | | + | | +---------------+ + | +----------------------> | Slave | + | | Interface | + | | 2 | + | | | + +----------------------------> | R | + | (Data) | + +---------------+ + + +Example 4: Stereo Stream with L and R channel is rendered by two different +Ports of the Master and is received by only single Port of the Slave +interface. :: + + +--------------------+ + | | + | +--------------+ +----------------+ + | | || | | + | | Data Port || L Channel | | + | | 1 |------------+ | | + | | L Channel || | +-----+----+ | + | | (Data) || | L + R Channel || Data | | + | Master +----------+ | +---+---------> || Port | | + | Interface | | || 1 | | + | +--------------+ | || | | + | | || | +----------+ | + | | Data Port |------------+ | | + | | 2 || R Channel | Slave | + | | R Channel || | Interface | + | | (Data) || | 1 | + | +--------------+ Clock Signal | L + R | + | +---------------------------> | (Data) | + +--------------------+ | | + +----------------+ + +SoundWire Stream Management flow +================================ + +Stream definitions +------------------ + + (1) Current stream: This is classified as the stream on which operation has + to be performed like prepare, enable, disable, de-prepare etc. + + (2) Active stream: This is classified as the stream which is already active + on Bus other than current stream. There can be multiple active streams + on the Bus. + +SoundWire Bus manages stream operations for each stream getting +rendered/captured on the SoundWire Bus. This section explains Bus operations +done for each of the stream allocated/released on Bus. Following are the +stream states maintained by the Bus for each of the audio stream. + + +SoundWire stream states +----------------------- + +Below shows the SoundWire stream states and state transition diagram. :: + + +-----------+ +------------+ +----------+ +----------+ + | ALLOCATED +---->| CONFIGURED +---->| PREPARED +---->| ENABLED | + | STATE | | STATE | | STATE | | STATE | + +-----------+ +------------+ +----------+ +----+-----+ + ^ + | + | + v + +----------+ +------------+ +----+-----+ + | RELEASED |<----------+ DEPREPARED |<-------+ DISABLED | + | STATE | | STATE | | STATE | + +----------+ +------------+ +----------+ + +NOTE: State transition between prepare and deprepare is supported in Spec +but not in the software (subsystem) + +NOTE2: Stream state transition checks need to be handled by caller +framework, for example ALSA/ASoC. No checks for stream transition exist in +SoundWire subsystem. + +Stream State Operations +----------------------- + +Below section explains the operations done by the Bus on Master(s) and +Slave(s) as part of stream state transitions. + +SDW_STREAM_ALLOCATED +~~~~~~~~~~~~~~~~~~~~ + +Allocation state for stream. This is the entry state +of the stream. Operations performed before entering in this state: + + (1) A stream runtime is allocated for the stream. This stream + runtime is used as a reference for all the operations performed + on the stream. + + (2) The resources required for holding stream runtime information are + allocated and initialized. This holds all stream related information + such as stream type (PCM/PDM) and parameters, Master and Slave + interface associated with the stream, stream state etc. + +After all above operations are successful, stream state is set to +``SDW_STREAM_ALLOCATED``. + +Bus implements below API for allocate a stream which needs to be called once +per stream. From ASoC DPCM framework, this stream state maybe linked to +.startup() operation. + + .. code-block:: c + int sdw_alloc_stream(char * stream_name); + + +SDW_STREAM_CONFIGURED +~~~~~~~~~~~~~~~~~~~~~ + +Configuration state of stream. Operations performed before entering in +this state: + + (1) The resources allocated for stream information in SDW_STREAM_ALLOCATED + state are updated here. This includes stream parameters, Master(s) + and Slave(s) runtime information associated with current stream. + + (2) All the Master(s) and Slave(s) associated with current stream provide + the port information to Bus which includes port numbers allocated by + Master(s) and Slave(s) for current stream and their channel mask. + +After all above operations are successful, stream state is set to +``SDW_STREAM_CONFIGURED``. + +Bus implements below APIs for CONFIG state which needs to be called by +the respective Master(s) and Slave(s) associated with stream. These APIs can +only be invoked once by respective Master(s) and Slave(s). From ASoC DPCM +framework, this stream state is linked to .hw_params() operation. + + .. code-block:: c + int sdw_stream_add_master(struct sdw_bus * bus, + struct sdw_stream_config * stream_config, + struct sdw_ports_config * ports_config, + struct sdw_stream_runtime * stream); + + int sdw_stream_add_slave(struct sdw_slave * slave, + struct sdw_stream_config * stream_config, + struct sdw_ports_config * ports_config, + struct sdw_stream_runtime * stream); + + +SDW_STREAM_PREPARED +~~~~~~~~~~~~~~~~~~~ + +Prepare state of stream. Operations performed before entering in this state: + + (1) Bus parameters such as bandwidth, frame shape, clock frequency, + are computed based on current stream as well as already active + stream(s) on Bus. Re-computation is required to accommodate current + stream on the Bus. + + (2) Transport and port parameters of all Master(s) and Slave(s) port(s) are + computed for the current as well as already active stream based on frame + shape and clock frequency computed in step 1. + + (3) Computed Bus and transport parameters are programmed in Master(s) and + Slave(s) registers. The banked registers programming is done on the + alternate bank (bank currently unused). Port(s) are enabled for the + already active stream(s) on the alternate bank (bank currently unused). + This is done in order to not disrupt already active stream(s). + + (4) Once all the values are programmed, Bus initiates switch to alternate + bank where all new values programmed gets into effect. + + (5) Ports of Master(s) and Slave(s) for current stream are prepared by + programming PrepareCtrl register. + +After all above operations are successful, stream state is set to +``SDW_STREAM_PREPARED``. + +Bus implements below API for PREPARE state which needs to be called once per +stream. From ASoC DPCM framework, this stream state is linked to +.prepare() operation. + + .. code-block:: c + int sdw_prepare_stream(struct sdw_stream_runtime * stream); + + +SDW_STREAM_ENABLED +~~~~~~~~~~~~~~~~~~ + +Enable state of stream. The data port(s) are enabled upon entering this state. +Operations performed before entering in this state: + + (1) All the values computed in SDW_STREAM_PREPARED state are programmed + in alternate bank (bank currently unused). It includes programming of + already active stream(s) as well. + + (2) All the Master(s) and Slave(s) port(s) for the current stream are + enabled on alternate bank (bank currently unused) by programming + ChannelEn register. + + (3) Once all the values are programmed, Bus initiates switch to alternate + bank where all new values programmed gets into effect and port(s) + associated with current stream are enabled. + +After all above operations are successful, stream state is set to +``SDW_STREAM_ENABLED``. + +Bus implements below API for ENABLE state which needs to be called once per +stream. From ASoC DPCM framework, this stream state is linked to +.trigger() start operation. + + .. code-block:: c + int sdw_enable_stream(struct sdw_stream_runtime * stream); + +SDW_STREAM_DISABLED +~~~~~~~~~~~~~~~~~~~ + +Disable state of stream. The data port(s) are disabled upon exiting this state. +Operations performed before entering in this state: + + (1) All the Master(s) and Slave(s) port(s) for the current stream are + disabled on alternate bank (bank currently unused) by programming + ChannelEn register. + + (2) All the current configuration of Bus and active stream(s) are programmed + into alternate bank (bank currently unused). + + (3) Once all the values are programmed, Bus initiates switch to alternate + bank where all new values programmed gets into effect and port(s) associated + with current stream are disabled. + +After all above operations are successful, stream state is set to +``SDW_STREAM_DISABLED``. + +Bus implements below API for DISABLED state which needs to be called once +per stream. From ASoC DPCM framework, this stream state is linked to +.trigger() stop operation. + + .. code-block:: c + int sdw_disable_stream(struct sdw_stream_runtime * stream); + + +SDW_STREAM_DEPREPARED +~~~~~~~~~~~~~~~~~~~~~ + +De-prepare state of stream. Operations performed before entering in this +state: + + (1) All the port(s) of Master(s) and Slave(s) for current stream are + de-prepared by programming PrepareCtrl register. + + (2) The payload bandwidth of current stream is reduced from the total + bandwidth requirement of bus and new parameters calculated and + applied by performing bank switch etc. + +After all above operations are successful, stream state is set to +``SDW_STREAM_DEPREPARED``. + +Bus implements below API for DEPREPARED state which needs to be called once +per stream. From ASoC DPCM framework, this stream state is linked to +.trigger() stop operation. + + .. code-block:: c + int sdw_deprepare_stream(struct sdw_stream_runtime * stream); + + +SDW_STREAM_RELEASED +~~~~~~~~~~~~~~~~~~~ + +Release state of stream. Operations performed before entering in this state: + + (1) Release port resources for all Master(s) and Slave(s) port(s) + associated with current stream. + + (2) Release Master(s) and Slave(s) runtime resources associated with + current stream. + + (3) Release stream runtime resources associated with current stream. + +After all above operations are successful, stream state is set to +``SDW_STREAM_RELEASED``. + +Bus implements below APIs for RELEASE state which needs to be called by +all the Master(s) and Slave(s) associated with stream. From ASoC DPCM +framework, this stream state is linked to .hw_free() operation. + + .. code-block:: c + int sdw_stream_remove_master(struct sdw_bus * bus, + struct sdw_stream_runtime * stream); + int sdw_stream_remove_slave(struct sdw_slave * slave, + struct sdw_stream_runtime * stream); + + +The .shutdown() ASoC DPCM operation calls below Bus API to release +stream assigned as part of ALLOCATED state. + +In .shutdown() the data structure maintaining stream state are freed up. + + .. code-block:: c + void sdw_release_stream(struct sdw_stream_runtime * stream); + +Not Supported +============= + +1. A single port with multiple channels supported cannot be used between two +streams or across stream. For example a port with 4 channels cannot be used +to handle 2 independent stereo streams even though it's possible in theory +in SoundWire. diff --git a/Documentation/driver-api/target.rst b/Documentation/driver-api/target.rst new file mode 100644 index 0000000000000000000000000000000000000000..4363611dd86d1dba4696c2574d2b83cdf8419321 --- /dev/null +++ b/Documentation/driver-api/target.rst @@ -0,0 +1,64 @@ +================================= +target and iSCSI Interfaces Guide +================================= + +Introduction and Overview +========================= + +TBD + +Target core device interfaces +============================= + +.. kernel-doc:: drivers/target/target_core_device.c + :export: + +Target core transport interfaces +================================ + +.. kernel-doc:: drivers/target/target_core_transport.c + :export: + +Target-supported userspace I/O +============================== + +.. kernel-doc:: drivers/target/target_core_user.c + :doc: Userspace I/O + +.. kernel-doc:: include/uapi/linux/target_core_user.h + :doc: Ring Design + +iSCSI helper functions +====================== + +.. kernel-doc:: drivers/scsi/libiscsi.c + :export: + + +iSCSI boot information +====================== + +.. kernel-doc:: drivers/scsi/iscsi_boot_sysfs.c + :export: + + +iSCSI transport class +===================== + +The file drivers/scsi/scsi_transport_iscsi.c defines transport +attributes for the iSCSI class, which sends SCSI packets over TCP/IP +connections. + +.. kernel-doc:: drivers/scsi/scsi_transport_iscsi.c + :export: + + +iSCSI TCP interfaces +==================== + +.. kernel-doc:: drivers/scsi/iscsi_tcp.c + :internal: + +.. kernel-doc:: drivers/scsi/libiscsi_tcp.c + :export: + diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst index 92056c20e070649debad6b84b4442413c1611678..fb2eb73be4a38812b6f4bfaf3bc710c0d7055367 100644 --- a/Documentation/driver-api/uio-howto.rst +++ b/Documentation/driver-api/uio-howto.rst @@ -711,7 +711,8 @@ The vmbus device regions are mapped into uio device resources: If a subchannel is created by a request to host, then the uio_hv_generic device driver will create a sysfs binary file for the per-channel ring buffer. -For example: +For example:: + /sys/bus/vmbus/devices/3811fe4d-0fa0-4b62-981a-74fc1084c757/channels/21/ring Further information diff --git a/Documentation/driver-api/usb/dwc3.rst b/Documentation/driver-api/usb/dwc3.rst index c3dc84a50ce54013aca2cd1b83b65e704b918e3b..8b36ff11cef9a9e02a303510143c850e6e009594 100644 --- a/Documentation/driver-api/usb/dwc3.rst +++ b/Documentation/driver-api/usb/dwc3.rst @@ -674,9 +674,8 @@ operations, both of which can be traced. Format is:: __entry->flags & DWC3_EP_ENABLED ? 'E' : 'e', __entry->flags & DWC3_EP_STALL ? 'S' : 's', __entry->flags & DWC3_EP_WEDGE ? 'W' : 'w', - __entry->flags & DWC3_EP_BUSY ? 'B' : 'b', + __entry->flags & DWC3_EP_TRANSFER_STARTED ? 'B' : 'b', __entry->flags & DWC3_EP_PENDING_REQUEST ? 'P' : 'p', - __entry->flags & DWC3_EP_MISSED_ISOC ? 'M' : 'm', __entry->flags & DWC3_EP_END_TRANSFER_PENDING ? 'E' : 'e', __entry->direction ? '<' : '>' ) diff --git a/Documentation/features/core/BPF-JIT/arch-support.txt b/Documentation/features/core/BPF-JIT/arch-support.txt deleted file mode 100644 index 0b96b4e1e7d4a8e2db2515925486196cf67ea504..0000000000000000000000000000000000000000 --- a/Documentation/features/core/BPF-JIT/arch-support.txt +++ /dev/null @@ -1,31 +0,0 @@ -# -# Feature name: BPF-JIT -# Kconfig: HAVE_BPF_JIT -# description: arch supports BPF JIT optimizations -# - ----------------------- - | arch |status| - ----------------------- - | alpha: | TODO | - | arc: | TODO | - | arm: | ok | - | arm64: | ok | - | c6x: | TODO | - | h8300: | TODO | - | hexagon: | TODO | - | ia64: | TODO | - | m68k: | TODO | - | microblaze: | TODO | - | mips: | ok | - | nios2: | TODO | - | openrisc: | TODO | - | parisc: | TODO | - | powerpc: | ok | - | s390: | ok | - | sh: | TODO | - | sparc: | ok | - | um: | TODO | - | unicore32: | TODO | - | x86: | ok | - | xtensa: | TODO | - ----------------------- diff --git a/Documentation/features/core/cBPF-JIT/arch-support.txt b/Documentation/features/core/cBPF-JIT/arch-support.txt new file mode 100644 index 0000000000000000000000000000000000000000..90459cdde314356949dc8da3d5da0f8b51fc5c18 --- /dev/null +++ b/Documentation/features/core/cBPF-JIT/arch-support.txt @@ -0,0 +1,33 @@ +# +# Feature name: cBPF-JIT +# Kconfig: HAVE_CBPF_JIT +# description: arch supports cBPF JIT optimizations +# + ----------------------- + | arch |status| + ----------------------- + | alpha: | TODO | + | arc: | TODO | + | arm: | TODO | + | arm64: | TODO | + | c6x: | TODO | + | h8300: | TODO | + | hexagon: | TODO | + | ia64: | TODO | + | m68k: | TODO | + | microblaze: | TODO | + | mips: | ok | + | nds32: | TODO | + | nios2: | TODO | + | openrisc: | TODO | + | parisc: | TODO | + | powerpc: | ok | + | riscv: | TODO | + | s390: | TODO | + | sh: | TODO | + | sparc: | ok | + | um: | TODO | + | unicore32: | TODO | + | x86: | TODO | + | xtensa: | TODO | + ----------------------- diff --git a/Documentation/features/core/eBPF-JIT/arch-support.txt b/Documentation/features/core/eBPF-JIT/arch-support.txt new file mode 100644 index 0000000000000000000000000000000000000000..c90a0382fe667fb8c572d22af08083c22341e863 --- /dev/null +++ b/Documentation/features/core/eBPF-JIT/arch-support.txt @@ -0,0 +1,33 @@ +# +# Feature name: eBPF-JIT +# Kconfig: HAVE_EBPF_JIT +# description: arch supports eBPF JIT optimizations +# + ----------------------- + | arch |status| + ----------------------- + | alpha: | TODO | + | arc: | TODO | + | arm: | ok | + | arm64: | ok | + | c6x: | TODO | + | h8300: | TODO | + | hexagon: | TODO | + | ia64: | TODO | + | m68k: | TODO | + | microblaze: | TODO | + | mips: | ok | + | nds32: | TODO | + | nios2: | TODO | + | openrisc: | TODO | + | parisc: | TODO | + | powerpc: | ok | + | riscv: | TODO | + | s390: | ok | + | sh: | TODO | + | sparc: | ok | + | um: | TODO | + | unicore32: | TODO | + | x86: | ok | + | xtensa: | TODO | + ----------------------- diff --git a/Documentation/features/core/generic-idle-thread/arch-support.txt b/Documentation/features/core/generic-idle-thread/arch-support.txt index 372a2b18a617245e4fe026c66f3e8cb4a8202d1f..0ef6acdb991c7078b4d7337a84075f6e16290f4a 100644 --- a/Documentation/features/core/generic-idle-thread/arch-support.txt +++ b/Documentation/features/core/generic-idle-thread/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | - | openrisc: | TODO | + | openrisc: | ok | | parisc: | ok | | powerpc: | ok | + | riscv: | ok | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/core/jump-labels/arch-support.txt b/Documentation/features/core/jump-labels/arch-support.txt index ad97217b003babbbea4c4dc9351e56cfc71f5e96..27cbd63abfd2832909b69be27e9c1324adaca09b 100644 --- a/Documentation/features/core/jump-labels/arch-support.txt +++ b/Documentation/features/core/jump-labels/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | TODO | | sparc: | ok | diff --git a/Documentation/features/core/tracehook/arch-support.txt b/Documentation/features/core/tracehook/arch-support.txt index 36ee7bef5d18918140d5fe7f304b58de27ec2e4e..f44c274e40ede915bfa8ccac3e4e97be79744be0 100644 --- a/Documentation/features/core/tracehook/arch-support.txt +++ b/Documentation/features/core/tracehook/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | ok | | nios2: | ok | | openrisc: | ok | | parisc: | ok | | powerpc: | ok | + | riscv: | ok | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/debug/KASAN/arch-support.txt b/Documentation/features/debug/KASAN/arch-support.txt index f5c99fa576d33538b86de8ca52f627fe22d6b5f0..282ecc8ea1da44a68e7892b9f8a17ab466084864 100644 --- a/Documentation/features/debug/KASAN/arch-support.txt +++ b/Documentation/features/debug/KASAN/arch-support.txt @@ -17,15 +17,17 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | | um: | TODO | | unicore32: | TODO | - | x86: | ok | 64-bit only + | x86: | ok | | xtensa: | ok | ----------------------- diff --git a/Documentation/features/debug/gcov-profile-all/arch-support.txt b/Documentation/features/debug/gcov-profile-all/arch-support.txt index 5170a9934843e351d1a4e4d1b73fd677f1ed73df..01b2b3004e0add62c84883811d0db17c978bd529 100644 --- a/Documentation/features/debug/gcov-profile-all/arch-support.txt +++ b/Documentation/features/debug/gcov-profile-all/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | ok | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | ok | | sparc: | TODO | diff --git a/Documentation/features/debug/kgdb/arch-support.txt b/Documentation/features/debug/kgdb/arch-support.txt index 13b6e994ae1fcea4f39eebc375bdd6ea4b854a6d..3b4dff22329fb4147aa1f95b043367d4506aaaf7 100644 --- a/Documentation/features/debug/kgdb/arch-support.txt +++ b/Documentation/features/debug/kgdb/arch-support.txt @@ -11,16 +11,18 @@ | arm: | ok | | arm64: | ok | | c6x: | TODO | - | h8300: | TODO | + | h8300: | ok | | hexagon: | ok | | ia64: | TODO | | m68k: | TODO | | microblaze: | ok | | mips: | ok | + | nds32: | TODO | | nios2: | ok | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | TODO | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt index 419bb38820e7ca025313d5e8bb942aaefc251fcb..7e963d0ae6461d2fd37c63ae012734dc5be16e62 100644 --- a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt +++ b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/debug/kprobes/arch-support.txt b/Documentation/features/debug/kprobes/arch-support.txt index 52b3ace0a030add4d00dde2b79aa603d9c3fa0aa..4ada027faf169643dd79e26a0ebd359fd59b3b75 100644 --- a/Documentation/features/debug/kprobes/arch-support.txt +++ b/Documentation/features/debug/kprobes/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | ok | | arm: | ok | - | arm64: | TODO | + | arm64: | ok | | c6x: | TODO | | h8300: | TODO | | hexagon: | TODO | @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | ok | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/debug/kretprobes/arch-support.txt b/Documentation/features/debug/kretprobes/arch-support.txt index 180d2441951857f6c78985e518e69d3c6c0fc02f..044e13fcca5d956eb35803bc31b337e1b5ceab86 100644 --- a/Documentation/features/debug/kretprobes/arch-support.txt +++ b/Documentation/features/debug/kretprobes/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | ok | | arm: | ok | - | arm64: | TODO | + | arm64: | ok | | c6x: | TODO | | h8300: | TODO | | hexagon: | TODO | @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/debug/optprobes/arch-support.txt b/Documentation/features/debug/optprobes/arch-support.txt index 0a1241f45e41d60b776522a6c1913637b27d6125..dce7669c918f36ad2c264609554a8988a5b73ccd 100644 --- a/Documentation/features/debug/optprobes/arch-support.txt +++ b/Documentation/features/debug/optprobes/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | - | powerpc: | TODO | + | powerpc: | ok | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/debug/stackprotector/arch-support.txt b/Documentation/features/debug/stackprotector/arch-support.txt index 5700195723834e8f5f13c9e660d8ffbe3216fa9c..954ac1c95553ef095040d3c862767aaf021cf7b5 100644 --- a/Documentation/features/debug/stackprotector/arch-support.txt +++ b/Documentation/features/debug/stackprotector/arch-support.txt @@ -1,6 +1,6 @@ # # Feature name: stackprotector -# Kconfig: HAVE_CC_STACKPROTECTOR +# Kconfig: HAVE_STACKPROTECTOR # description: arch supports compiler driven stack overflow protection # ----------------------- @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | ok | | sparc: | TODO | diff --git a/Documentation/features/debug/uprobes/arch-support.txt b/Documentation/features/debug/uprobes/arch-support.txt index 0b8d922eb799efefdfe4bd91b7d3b9029fad422f..1a3f9d3229bfea9377b18833d955792d083ca850 100644 --- a/Documentation/features/debug/uprobes/arch-support.txt +++ b/Documentation/features/debug/uprobes/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | ok | - | arm64: | TODO | + | arm64: | ok | | c6x: | TODO | | h8300: | TODO | | hexagon: | TODO | @@ -17,13 +17,15 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | TODO | - | sparc: | TODO | + | sparc: | ok | | um: | TODO | | unicore32: | TODO | | x86: | ok | diff --git a/Documentation/features/debug/user-ret-profiler/arch-support.txt b/Documentation/features/debug/user-ret-profiler/arch-support.txt index 13852ae62e9e1cc9daf8d07e742e647c1c18f140..1d78d1069a5fdd12d11df39d06db8d10ea2269de 100644 --- a/Documentation/features/debug/user-ret-profiler/arch-support.txt +++ b/Documentation/features/debug/user-ret-profiler/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/io/dma-api-debug/arch-support.txt b/Documentation/features/io/dma-api-debug/arch-support.txt deleted file mode 100644 index e438ed675623bc3b75a2059538fac66d8e7680d1..0000000000000000000000000000000000000000 --- a/Documentation/features/io/dma-api-debug/arch-support.txt +++ /dev/null @@ -1,31 +0,0 @@ -# -# Feature name: dma-api-debug -# Kconfig: HAVE_DMA_API_DEBUG -# description: arch supports DMA debug facilities -# - ----------------------- - | arch |status| - ----------------------- - | alpha: | TODO | - | arc: | TODO | - | arm: | ok | - | arm64: | ok | - | c6x: | ok | - | h8300: | TODO | - | hexagon: | TODO | - | ia64: | ok | - | m68k: | TODO | - | microblaze: | ok | - | mips: | ok | - | nios2: | TODO | - | openrisc: | TODO | - | parisc: | TODO | - | powerpc: | ok | - | s390: | ok | - | sh: | ok | - | sparc: | ok | - | um: | TODO | - | unicore32: | TODO | - | x86: | ok | - | xtensa: | ok | - ----------------------- diff --git a/Documentation/features/io/dma-contiguous/arch-support.txt b/Documentation/features/io/dma-contiguous/arch-support.txt index 47f64a433df001127cb64fdba339aecc5e3c7aa8..30c072d2b67cac8b74b0eb09afcb4967f7ed4ec4 100644 --- a/Documentation/features/io/dma-contiguous/arch-support.txt +++ b/Documentation/features/io/dma-contiguous/arch-support.txt @@ -17,11 +17,13 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | - | s390: | TODO | + | riscv: | ok | + | s390: | ok | | sh: | TODO | | sparc: | TODO | | um: | TODO | diff --git a/Documentation/features/io/sg-chain/arch-support.txt b/Documentation/features/io/sg-chain/arch-support.txt index 07f357fadbff673fabe1dbe6c842f5aa08391e42..6554f0372c3fc534e1f7f86953b426a2a9551f5d 100644 --- a/Documentation/features/io/sg-chain/arch-support.txt +++ b/Documentation/features/io/sg-chain/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | TODO | | sparc: | ok | diff --git a/Documentation/features/lib/strncasecmp/arch-support.txt b/Documentation/features/lib/strncasecmp/arch-support.txt deleted file mode 100644 index 4f3a6a0e4e683d094b687fffee13ba0bec278e81..0000000000000000000000000000000000000000 --- a/Documentation/features/lib/strncasecmp/arch-support.txt +++ /dev/null @@ -1,31 +0,0 @@ -# -# Feature name: strncasecmp -# Kconfig: __HAVE_ARCH_STRNCASECMP -# description: arch provides an optimized strncasecmp() function -# - ----------------------- - | arch |status| - ----------------------- - | alpha: | TODO | - | arc: | TODO | - | arm: | TODO | - | arm64: | TODO | - | c6x: | TODO | - | h8300: | TODO | - | hexagon: | TODO | - | ia64: | TODO | - | m68k: | TODO | - | microblaze: | TODO | - | mips: | TODO | - | nios2: | TODO | - | openrisc: | TODO | - | parisc: | TODO | - | powerpc: | TODO | - | s390: | TODO | - | sh: | TODO | - | sparc: | TODO | - | um: | TODO | - | unicore32: | TODO | - | x86: | TODO | - | xtensa: | TODO | - ----------------------- diff --git a/Documentation/features/locking/cmpxchg-local/arch-support.txt b/Documentation/features/locking/cmpxchg-local/arch-support.txt index 482a0b09d1f89c1d82685009cee66ab730230b04..51704a2dc8d17f22ed50633c4181782f943b95d6 100644 --- a/Documentation/features/locking/cmpxchg-local/arch-support.txt +++ b/Documentation/features/locking/cmpxchg-local/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | - | arm64: | TODO | + | arm64: | ok | | c6x: | TODO | | h8300: | TODO | | hexagon: | TODO | @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | ok | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/locking/lockdep/arch-support.txt b/Documentation/features/locking/lockdep/arch-support.txt index bb35c5ba62866341b662638cafad61c8257ab85e..bd39c5edd460736c2b01dbda9e6e95c56d986481 100644 --- a/Documentation/features/locking/lockdep/arch-support.txt +++ b/Documentation/features/locking/lockdep/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | ok | | mips: | ok | + | nds32: | ok | | nios2: | TODO | - | openrisc: | TODO | + | openrisc: | ok | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt b/Documentation/features/locking/queued-rwlocks/arch-support.txt index 627e9a6b2db98d79abc521c1657d521270c74f84..da7aff3bee0b332e9b2417f3c0b1a75a183e64bb 100644 --- a/Documentation/features/locking/queued-rwlocks/arch-support.txt +++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt @@ -9,21 +9,23 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | - | arm64: | TODO | + | arm64: | ok | | c6x: | TODO | | h8300: | TODO | | hexagon: | TODO | | ia64: | TODO | | m68k: | TODO | | microblaze: | TODO | - | mips: | TODO | + | mips: | ok | + | nds32: | TODO | | nios2: | TODO | - | openrisc: | TODO | + | openrisc: | ok | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | - | sparc: | TODO | + | sparc: | ok | | um: | TODO | | unicore32: | TODO | | x86: | ok | diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt index 9edda216cdfbf2d99a9ae084467dc2f7c0ef387f..478e9101322c428ef59db37824f041f144a76677 100644 --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt @@ -16,14 +16,16 @@ | ia64: | TODO | | m68k: | TODO | | microblaze: | TODO | - | mips: | TODO | + | mips: | ok | + | nds32: | TODO | | nios2: | TODO | - | openrisc: | TODO | + | openrisc: | ok | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | - | sparc: | TODO | + | sparc: | ok | | um: | TODO | | unicore32: | TODO | | x86: | ok | diff --git a/Documentation/features/locking/rwsem-optimized/arch-support.txt b/Documentation/features/locking/rwsem-optimized/arch-support.txt index 8d9afb10b16e7cc301659a221841fffeefad431b..e54b1f1a8091d82891e3c87324a1b221867869b3 100644 --- a/Documentation/features/locking/rwsem-optimized/arch-support.txt +++ b/Documentation/features/locking/rwsem-optimized/arch-support.txt @@ -1,6 +1,6 @@ # # Feature name: rwsem-optimized -# Kconfig: Optimized asm/rwsem.h +# Kconfig: !RWSEM_GENERIC_SPINLOCK # description: arch provides optimized rwsem APIs # ----------------------- @@ -8,8 +8,8 @@ ----------------------- | alpha: | ok | | arc: | TODO | - | arm: | TODO | - | arm64: | TODO | + | arm: | ok | + | arm64: | ok | | c6x: | TODO | | h8300: | TODO | | hexagon: | TODO | @@ -17,14 +17,16 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | ok | | sh: | ok | | sparc: | ok | - | um: | TODO | + | um: | ok | | unicore32: | TODO | | x86: | ok | | xtensa: | ok | diff --git a/Documentation/features/perf/kprobes-event/arch-support.txt b/Documentation/features/perf/kprobes-event/arch-support.txt index d01239ee34b3495f0319e52807c210ff568dcb9d..7331402d188720e78dfbe66fc7e42eef275a8997 100644 --- a/Documentation/features/perf/kprobes-event/arch-support.txt +++ b/Documentation/features/perf/kprobes-event/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | ok | - | arm64: | TODO | + | arm64: | ok | | c6x: | TODO | | h8300: | TODO | | hexagon: | ok | @@ -17,13 +17,15 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | ok | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | ok | - | sparc: | TODO | + | sparc: | ok | | um: | TODO | | unicore32: | TODO | | x86: | ok | diff --git a/Documentation/features/perf/perf-regs/arch-support.txt b/Documentation/features/perf/perf-regs/arch-support.txt index 458faba5311ab89fa77140bdd8e01fc7c6fcfb1e..53feeee6cdad927c8f2e7cf6c33b455644ff9fa6 100644 --- a/Documentation/features/perf/perf-regs/arch-support.txt +++ b/Documentation/features/perf/perf-regs/arch-support.txt @@ -17,11 +17,13 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | - | s390: | TODO | + | riscv: | TODO | + | s390: | ok | | sh: | TODO | | sparc: | TODO | | um: | TODO | diff --git a/Documentation/features/perf/perf-stackdump/arch-support.txt b/Documentation/features/perf/perf-stackdump/arch-support.txt index 545d01c69c88f0d008c4797b20df251098527385..16164348e0ea3321aa7e3d84f75fc21632724c37 100644 --- a/Documentation/features/perf/perf-stackdump/arch-support.txt +++ b/Documentation/features/perf/perf-stackdump/arch-support.txt @@ -17,11 +17,13 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | - | s390: | TODO | + | riscv: | TODO | + | s390: | ok | | sh: | TODO | | sparc: | TODO | | um: | TODO | diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt index 85a6c9d4571ce825615b42ac6ba8eb1921c9a5aa..dbdf629077037acd99fe18bdced8f71b7eb54639 100644 --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt @@ -40,10 +40,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/sched/numa-balancing/arch-support.txt b/Documentation/features/sched/numa-balancing/arch-support.txt index 3475088638726b4151c89c46ba57a6f726c7f911..c68bb2c2cb626e1f814afde3bb5e8afb4d82e049 100644 --- a/Documentation/features/sched/numa-balancing/arch-support.txt +++ b/Documentation/features/sched/numa-balancing/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | .. | | arm: | .. | - | arm64: | .. | + | arm64: | ok | | c6x: | .. | | h8300: | .. | | hexagon: | .. | @@ -17,11 +17,13 @@ | m68k: | .. | | microblaze: | .. | | mips: | TODO | + | nds32: | TODO | | nios2: | .. | | openrisc: | .. | | parisc: | .. | | powerpc: | ok | - | s390: | .. | + | riscv: | TODO | + | s390: | ok | | sh: | .. | | sparc: | TODO | | um: | .. | diff --git a/Documentation/features/scripts/features-refresh.sh b/Documentation/features/scripts/features-refresh.sh new file mode 100755 index 0000000000000000000000000000000000000000..9e72d38a0720ebd0f37f50bdec1e866a21fea62a --- /dev/null +++ b/Documentation/features/scripts/features-refresh.sh @@ -0,0 +1,98 @@ +# +# Small script that refreshes the kernel feature support status in place. +# + +for F_FILE in Documentation/features/*/*/arch-support.txt; do + F=$(grep "^# Kconfig:" "$F_FILE" | cut -c26-) + + # + # Each feature F is identified by a pair (O, K), where 'O' can + # be either the empty string (for 'nop') or "not" (the logical + # negation operator '!'); other operators are not supported. + # + O="" + K=$F + if [[ "$F" == !* ]]; then + O="not" + K=$(echo $F | sed -e 's/^!//g') + fi + + # + # F := (O, K) is 'valid' iff there is a Kconfig file (for some + # arch) which contains K. + # + # Notice that this definition entails an 'asymmetry' between + # the case 'O = ""' and the case 'O = "not"'. E.g., F may be + # _invalid_ if: + # + # [case 'O = ""'] + # 1) no arch provides support for F, + # 2) K does not exist (e.g., it was renamed/mis-typed); + # + # [case 'O = "not"'] + # 3) all archs provide support for F, + # 4) as in (2). + # + # The rationale for adopting this definition (and, thus, for + # keeping the asymmetry) is: + # + # We want to be able to 'detect' (2) (or (4)). + # + # (1) and (3) may further warn the developers about the fact + # that K can be removed. + # + F_VALID="false" + for ARCH_DIR in arch/*/; do + K_FILES=$(find $ARCH_DIR -name "Kconfig*") + K_GREP=$(grep "$K" $K_FILES) + if [ ! -z "$K_GREP" ]; then + F_VALID="true" + break + fi + done + if [ "$F_VALID" = "false" ]; then + printf "WARNING: '%s' is not a valid Kconfig\n" "$F" + fi + + T_FILE="$F_FILE.tmp" + grep "^#" $F_FILE > $T_FILE + echo " -----------------------" >> $T_FILE + echo " | arch |status|" >> $T_FILE + echo " -----------------------" >> $T_FILE + for ARCH_DIR in arch/*/; do + ARCH=$(echo $ARCH_DIR | sed -e 's/arch//g' | sed -e 's/\///g') + K_FILES=$(find $ARCH_DIR -name "Kconfig*") + K_GREP=$(grep "$K" $K_FILES) + # + # Arch support status values for (O, K) are updated according + # to the following rules. + # + # - ("", K) is 'supported by a given arch', if there is a + # Kconfig file for that arch which contains K; + # + # - ("not", K) is 'supported by a given arch', if there is + # no Kconfig file for that arch which contains K; + # + # - otherwise: preserve the previous status value (if any), + # default to 'not yet supported'. + # + # Notice that, according these rules, invalid features may be + # updated/modified. + # + if [ "$O" = "" ] && [ ! -z "$K_GREP" ]; then + printf " |%12s: | ok |\n" "$ARCH" >> $T_FILE + elif [ "$O" = "not" ] && [ -z "$K_GREP" ]; then + printf " |%12s: | ok |\n" "$ARCH" >> $T_FILE + else + S=$(grep -v "^#" "$F_FILE" | grep " $ARCH:") + if [ ! -z "$S" ]; then + echo "$S" >> $T_FILE + else + printf " |%12s: | TODO |\n" "$ARCH" \ + >> $T_FILE + fi + fi + done + echo " -----------------------" >> $T_FILE + mv $T_FILE $F_FILE +done diff --git a/Documentation/features/seccomp/seccomp-filter/arch-support.txt b/Documentation/features/seccomp/seccomp-filter/arch-support.txt index e4fad58a05e514b311787a296f8b3551d42e2de9..d4271b493b41977c6f0c5b16f2501f3f6e7301ce 100644 --- a/Documentation/features/seccomp/seccomp-filter/arch-support.txt +++ b/Documentation/features/seccomp/seccomp-filter/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | - | parisc: | TODO | - | powerpc: | TODO | + | parisc: | ok | + | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/time/arch-tick-broadcast/arch-support.txt b/Documentation/features/time/arch-tick-broadcast/arch-support.txt index 8052904b25fc762dcaa674fde848b931322e9453..83d9e68462bbf27ec1ad4c32d552403d28d4d084 100644 --- a/Documentation/features/time/arch-tick-broadcast/arch-support.txt +++ b/Documentation/features/time/arch-tick-broadcast/arch-support.txt @@ -17,12 +17,14 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | TODO | - | sh: | TODO | + | sh: | ok | | sparc: | TODO | | um: | TODO | | unicore32: | TODO | diff --git a/Documentation/features/time/clockevents/arch-support.txt b/Documentation/features/time/clockevents/arch-support.txt index 7c76b946297e9b93b19cbe3de40bc232c0335636..3d4908fce6da848b451e8a5e1f65940be4233603 100644 --- a/Documentation/features/time/clockevents/arch-support.txt +++ b/Documentation/features/time/clockevents/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | ok | | microblaze: | ok | | mips: | ok | + | nds32: | ok | | nios2: | ok | | openrisc: | ok | - | parisc: | TODO | + | parisc: | ok | | powerpc: | ok | + | riscv: | ok | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/time/context-tracking/arch-support.txt b/Documentation/features/time/context-tracking/arch-support.txt index 9433b3e523b39024dde09979a36192d699e0d254..c29974afffaa52addab00abd35d92ed8611e7648 100644 --- a/Documentation/features/time/context-tracking/arch-support.txt +++ b/Documentation/features/time/context-tracking/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | ok | diff --git a/Documentation/features/time/irq-time-acct/arch-support.txt b/Documentation/features/time/irq-time-acct/arch-support.txt index 212dde0b578c808a4dca30ab6727d55c2897369e..8d73c463ec27a3e1ad6ec8ea4536650d4ab359ad 100644 --- a/Documentation/features/time/irq-time-acct/arch-support.txt +++ b/Documentation/features/time/irq-time-acct/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | .. | - | powerpc: | .. | + | powerpc: | ok | + | riscv: | TODO | | s390: | .. | | sh: | TODO | | sparc: | .. | diff --git a/Documentation/features/time/modern-timekeeping/arch-support.txt b/Documentation/features/time/modern-timekeeping/arch-support.txt index 4074028f72f7276465a669f60affd2d0d1a2863c..e7c6ea6b8fb3238a46025dcd7a6e89323829b287 100644 --- a/Documentation/features/time/modern-timekeeping/arch-support.txt +++ b/Documentation/features/time/modern-timekeeping/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | ok | | mips: | ok | + | nds32: | ok | | nios2: | ok | | openrisc: | ok | | parisc: | ok | | powerpc: | ok | + | riscv: | ok | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/time/virt-cpuacct/arch-support.txt b/Documentation/features/time/virt-cpuacct/arch-support.txt index a394d8820517bf8346a07b87a69d2d64a014de5f..4646457461cf8b81234c554a6c4e3d91db6672c5 100644 --- a/Documentation/features/time/virt-cpuacct/arch-support.txt +++ b/Documentation/features/time/virt-cpuacct/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | ok | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | TODO | | sparc: | ok | diff --git a/Documentation/features/vm/ELF-ASLR/arch-support.txt b/Documentation/features/vm/ELF-ASLR/arch-support.txt index 082f93d5b40ed473729f3c03f49261aff75c05b3..1f71d090ff2c8b14d95e0d0350d8cbdfa649eb3d 100644 --- a/Documentation/features/vm/ELF-ASLR/arch-support.txt +++ b/Documentation/features/vm/ELF-ASLR/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | ok | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | - | parisc: | TODO | + | parisc: | ok | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/vm/PG_uncached/arch-support.txt b/Documentation/features/vm/PG_uncached/arch-support.txt index 605e0abb756d834f5660643bd02815a36700bcac..fbd5aa463b0a146c12959e4de13b94614e19c4d8 100644 --- a/Documentation/features/vm/PG_uncached/arch-support.txt +++ b/Documentation/features/vm/PG_uncached/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/vm/THP/arch-support.txt b/Documentation/features/vm/THP/arch-support.txt index 7a8eb0bd5ca843c136c78e403d77a724b2fbb937..5d7ecc378f29e53175de5653bf926b1c19139d42 100644 --- a/Documentation/features/vm/THP/arch-support.txt +++ b/Documentation/features/vm/THP/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | .. | | microblaze: | .. | | mips: | ok | + | nds32: | TODO | | nios2: | .. | | openrisc: | .. | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | .. | | sparc: | ok | diff --git a/Documentation/features/vm/TLB/arch-support.txt b/Documentation/features/vm/TLB/arch-support.txt index 35fb99b2b3ea1edc934ad3c98497b5c721a3e2cd..f7af9678eb660f87956d8e220eb12a58ba84af5a 100644 --- a/Documentation/features/vm/TLB/arch-support.txt +++ b/Documentation/features/vm/TLB/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | .. | | microblaze: | .. | | mips: | TODO | + | nds32: | TODO | | nios2: | .. | | openrisc: | .. | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/vm/huge-vmap/arch-support.txt b/Documentation/features/vm/huge-vmap/arch-support.txt index ed8b943ad8fc8347569e67f926c56323d5754392..d0713ccc7117e680b2c71795efbc1d0d95586e71 100644 --- a/Documentation/features/vm/huge-vmap/arch-support.txt +++ b/Documentation/features/vm/huge-vmap/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | TODO | + | riscv: | TODO | | s390: | TODO | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/vm/ioremap_prot/arch-support.txt b/Documentation/features/vm/ioremap_prot/arch-support.txt index 589947bdf0a8ac2a16ed03c7028ee2b993699ea3..8527601a3739d38f5fe7d3f0d80d2e21d31d304b 100644 --- a/Documentation/features/vm/ioremap_prot/arch-support.txt +++ b/Documentation/features/vm/ioremap_prot/arch-support.txt @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | TODO | | sh: | ok | | sparc: | TODO | diff --git a/Documentation/features/vm/numa-memblock/arch-support.txt b/Documentation/features/vm/numa-memblock/arch-support.txt index 8b8bea0318a0d93cb7a31d302792342a5e6cbe49..1a988052cd24a5203c8c4462c4ddf7245c161b40 100644 --- a/Documentation/features/vm/numa-memblock/arch-support.txt +++ b/Documentation/features/vm/numa-memblock/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | .. | | arm: | .. | - | arm64: | .. | + | arm64: | ok | | c6x: | .. | | h8300: | .. | | hexagon: | .. | @@ -17,10 +17,12 @@ | m68k: | .. | | microblaze: | ok | | mips: | ok | + | nds32: | TODO | | nios2: | .. | | openrisc: | .. | | parisc: | .. | | powerpc: | ok | + | riscv: | ok | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/features/vm/pte_special/arch-support.txt b/Documentation/features/vm/pte_special/arch-support.txt index 055004f467d2ccb2bbab3cc01d5c447573d18b25..a8378424bc98450563e14cce59da34008001ca5d 100644 --- a/Documentation/features/vm/pte_special/arch-support.txt +++ b/Documentation/features/vm/pte_special/arch-support.txt @@ -1,6 +1,6 @@ # # Feature name: pte_special -# Kconfig: __HAVE_ARCH_PTE_SPECIAL +# Kconfig: ARCH_HAS_PTE_SPECIAL # description: arch supports the pte_special()/pte_mkspecial() VM APIs # ----------------------- @@ -17,10 +17,12 @@ | m68k: | TODO | | microblaze: | TODO | | mips: | TODO | + | nds32: | TODO | | nios2: | TODO | | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | + | riscv: | TODO | | s390: | ok | | sh: | ok | | sparc: | ok | diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index b7bd6c9009ccb1485d95075221c0ce8dc016ebe2..0937bade1099d66d0da25a3ddcbb66562f1ba322 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -10,8 +10,8 @@ afs.txt - info and examples for the distributed AFS (Andrew File System) fs. affs.txt - info and mount options for the Amiga Fast File System. -autofs4-mount-control.txt - - info on device control operations for autofs4 module. +autofs-mount-control.txt + - info on device control operations for autofs module. automount-support.txt - information about filesystem automount support. befs.txt @@ -89,8 +89,6 @@ locks.txt - info on file locking implementations, flock() vs. fcntl(), etc. mandatory-locking.txt - info on the Linux implementation of Sys V mandatory file locking. -ncpfs.txt - - info on Novell Netware(tm) filesystem using NCP protocol. nfs/ - nfs-related documentation. nilfs2.txt diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 75d2d57e2c4421122434aa855444e8a344c38f21..37bf0a9de75cbe79794e653ff161e4a5eb37a97a 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -69,31 +69,31 @@ prototypes: locking rules: all may block - i_mutex(inode) -lookup: yes -create: yes -link: yes (both) -mknod: yes -symlink: yes -mkdir: yes -unlink: yes (both) -rmdir: yes (both) (see below) -rename: yes (all) (see below) + i_rwsem(inode) +lookup: shared +create: exclusive +link: exclusive (both) +mknod: exclusive +symlink: exclusive +mkdir: exclusive +unlink: exclusive (both) +rmdir: exclusive (both)(see below) +rename: exclusive (all) (see below) readlink: no get_link: no -setattr: yes +setattr: exclusive permission: no (may not block if called in rcu-walk mode) get_acl: no getattr: no listxattr: no fiemap: no update_time: no -atomic_open: yes +atomic_open: exclusive tmpfile: no - Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_mutex on -victim. + Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem + exclusive on victim. cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem. See Documentation/filesystems/directory-locking for more detailed discussion @@ -111,10 +111,10 @@ prototypes: locking rules: all may block - i_mutex(inode) + i_rwsem(inode) list: no get: no -set: yes +set: exclusive --------------------------- super_operations --------------------------- prototypes: @@ -217,14 +217,14 @@ prototypes: locking rules: All except set_page_dirty and freepage may block - PageLocked(page) i_mutex + PageLocked(page) i_rwsem writepage: yes, unlocks (see below) readpage: yes, unlocks writepages: set_page_dirty no readpages: -write_begin: locks the page yes -write_end: yes, unlocks yes +write_begin: locks the page exclusive +write_end: yes, unlocks exclusive bmap: invalidatepage: yes releasepage: yes @@ -439,7 +439,8 @@ prototypes: ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); int (*iterate) (struct file *, struct dir_context *); - unsigned int (*poll) (struct file *, struct poll_table_struct *); + int (*iterate_shared) (struct file *, struct dir_context *); + __poll_t (*poll) (struct file *, struct poll_table_struct *); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); @@ -480,6 +481,10 @@ mutex or just to use i_size_read() instead. Note: this does not protect the file->f_pos against concurrent modifications since this is something the userspace has to take care about. +->iterate() is called with i_rwsem exclusive. + +->iterate_shared() is called with i_rwsem at least shared. + ->fasync() is responsible for maintaining the FASYNC bit in filp->f_flags. Most instances call fasync_helper(), which does that maintenance, so it's not normally something one needs to worry about. Return values > 0 will be diff --git a/Documentation/filesystems/autofs-mount-control.txt b/Documentation/filesystems/autofs-mount-control.txt new file mode 100644 index 0000000000000000000000000000000000000000..45edad6933cc4fe151455a44734196688e40cf2a --- /dev/null +++ b/Documentation/filesystems/autofs-mount-control.txt @@ -0,0 +1,406 @@ + +Miscellaneous Device control operations for the autofs kernel module +==================================================================== + +The problem +=========== + +There is a problem with active restarts in autofs (that is to say +restarting autofs when there are busy mounts). + +During normal operation autofs uses a file descriptor opened on the +directory that is being managed in order to be able to issue control +operations. Using a file descriptor gives ioctl operations access to +autofs specific information stored in the super block. The operations +are things such as setting an autofs mount catatonic, setting the +expire timeout and requesting expire checks. As is explained below, +certain types of autofs triggered mounts can end up covering an autofs +mount itself which prevents us being able to use open(2) to obtain a +file descriptor for these operations if we don't already have one open. + +Currently autofs uses "umount -l" (lazy umount) to clear active mounts +at restart. While using lazy umount works for most cases, anything that +needs to walk back up the mount tree to construct a path, such as +getcwd(2) and the proc file system /proc//cwd, no longer works +because the point from which the path is constructed has been detached +from the mount tree. + +The actual problem with autofs is that it can't reconnect to existing +mounts. Immediately one thinks of just adding the ability to remount +autofs file systems would solve it, but alas, that can't work. This is +because autofs direct mounts and the implementation of "on demand mount +and expire" of nested mount trees have the file system mounted directly +on top of the mount trigger directory dentry. + +For example, there are two types of automount maps, direct (in the kernel +module source you will see a third type called an offset, which is just +a direct mount in disguise) and indirect. + +Here is a master map with direct and indirect map entries: + +/- /etc/auto.direct +/test /etc/auto.indirect + +and the corresponding map files: + +/etc/auto.direct: + +/automount/dparse/g6 budgie:/autofs/export1 +/automount/dparse/g1 shark:/autofs/export1 +and so on. + +/etc/auto.indirect: + +g1 shark:/autofs/export1 +g6 budgie:/autofs/export1 +and so on. + +For the above indirect map an autofs file system is mounted on /test and +mounts are triggered for each sub-directory key by the inode lookup +operation. So we see a mount of shark:/autofs/export1 on /test/g1, for +example. + +The way that direct mounts are handled is by making an autofs mount on +each full path, such as /automount/dparse/g1, and using it as a mount +trigger. So when we walk on the path we mount shark:/autofs/export1 "on +top of this mount point". Since these are always directories we can +use the follow_link inode operation to trigger the mount. + +But, each entry in direct and indirect maps can have offsets (making +them multi-mount map entries). + +For example, an indirect mount map entry could also be: + +g1 \ + / shark:/autofs/export5/testing/test \ + /s1 shark:/autofs/export/testing/test/s1 \ + /s2 shark:/autofs/export5/testing/test/s2 \ + /s1/ss1 shark:/autofs/export1 \ + /s2/ss2 shark:/autofs/export2 + +and a similarly a direct mount map entry could also be: + +/automount/dparse/g1 \ + / shark:/autofs/export5/testing/test \ + /s1 shark:/autofs/export/testing/test/s1 \ + /s2 shark:/autofs/export5/testing/test/s2 \ + /s1/ss1 shark:/autofs/export2 \ + /s2/ss2 shark:/autofs/export2 + +One of the issues with version 4 of autofs was that, when mounting an +entry with a large number of offsets, possibly with nesting, we needed +to mount and umount all of the offsets as a single unit. Not really a +problem, except for people with a large number of offsets in map entries. +This mechanism is used for the well known "hosts" map and we have seen +cases (in 2.4) where the available number of mounts are exhausted or +where the number of privileged ports available is exhausted. + +In version 5 we mount only as we go down the tree of offsets and +similarly for expiring them which resolves the above problem. There is +somewhat more detail to the implementation but it isn't needed for the +sake of the problem explanation. The one important detail is that these +offsets are implemented using the same mechanism as the direct mounts +above and so the mount points can be covered by a mount. + +The current autofs implementation uses an ioctl file descriptor opened +on the mount point for control operations. The references held by the +descriptor are accounted for in checks made to determine if a mount is +in use and is also used to access autofs file system information held +in the mount super block. So the use of a file handle needs to be +retained. + + +The Solution +============ + +To be able to restart autofs leaving existing direct, indirect and +offset mounts in place we need to be able to obtain a file handle +for these potentially covered autofs mount points. Rather than just +implement an isolated operation it was decided to re-implement the +existing ioctl interface and add new operations to provide this +functionality. + +In addition, to be able to reconstruct a mount tree that has busy mounts, +the uid and gid of the last user that triggered the mount needs to be +available because these can be used as macro substitution variables in +autofs maps. They are recorded at mount request time and an operation +has been added to retrieve them. + +Since we're re-implementing the control interface, a couple of other +problems with the existing interface have been addressed. First, when +a mount or expire operation completes a status is returned to the +kernel by either a "send ready" or a "send fail" operation. The +"send fail" operation of the ioctl interface could only ever send +ENOENT so the re-implementation allows user space to send an actual +status. Another expensive operation in user space, for those using +very large maps, is discovering if a mount is present. Usually this +involves scanning /proc/mounts and since it needs to be done quite +often it can introduce significant overhead when there are many entries +in the mount table. An operation to lookup the mount status of a mount +point dentry (covered or not) has also been added. + +Current kernel development policy recommends avoiding the use of the +ioctl mechanism in favor of systems such as Netlink. An implementation +using this system was attempted to evaluate its suitability and it was +found to be inadequate, in this case. The Generic Netlink system was +used for this as raw Netlink would lead to a significant increase in +complexity. There's no question that the Generic Netlink system is an +elegant solution for common case ioctl functions but it's not a complete +replacement probably because its primary purpose in life is to be a +message bus implementation rather than specifically an ioctl replacement. +While it would be possible to work around this there is one concern +that lead to the decision to not use it. This is that the autofs +expire in the daemon has become far to complex because umount +candidates are enumerated, almost for no other reason than to "count" +the number of times to call the expire ioctl. This involves scanning +the mount table which has proved to be a big overhead for users with +large maps. The best way to improve this is try and get back to the +way the expire was done long ago. That is, when an expire request is +issued for a mount (file handle) we should continually call back to +the daemon until we can't umount any more mounts, then return the +appropriate status to the daemon. At the moment we just expire one +mount at a time. A Generic Netlink implementation would exclude this +possibility for future development due to the requirements of the +message bus architecture. + + +autofs Miscellaneous Device mount control interface +==================================================== + +The control interface is opening a device node, typically /dev/autofs. + +All the ioctls use a common structure to pass the needed parameter +information and return operation results: + +struct autofs_dev_ioctl { + __u32 ver_major; + __u32 ver_minor; + __u32 size; /* total size of data passed in + * including this struct */ + __s32 ioctlfd; /* automount command fd */ + + /* Command parameters */ + union { + struct args_protover protover; + struct args_protosubver protosubver; + struct args_openmount openmount; + struct args_ready ready; + struct args_fail fail; + struct args_setpipefd setpipefd; + struct args_timeout timeout; + struct args_requester requester; + struct args_expire expire; + struct args_askumount askumount; + struct args_ismountpoint ismountpoint; + }; + + char path[0]; +}; + +The ioctlfd field is a mount point file descriptor of an autofs mount +point. It is returned by the open call and is used by all calls except +the check for whether a given path is a mount point, where it may +optionally be used to check a specific mount corresponding to a given +mount point file descriptor, and when requesting the uid and gid of the +last successful mount on a directory within the autofs file system. + +The union is used to communicate parameters and results of calls made +as described below. + +The path field is used to pass a path where it is needed and the size field +is used account for the increased structure length when translating the +structure sent from user space. + +This structure can be initialized before setting specific fields by using +the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *). + +All of the ioctls perform a copy of this structure from user space to +kernel space and return -EINVAL if the size parameter is smaller than +the structure size itself, -ENOMEM if the kernel memory allocation fails +or -EFAULT if the copy itself fails. Other checks include a version check +of the compiled in user space version against the module version and a +mismatch results in a -EINVAL return. If the size field is greater than +the structure size then a path is assumed to be present and is checked to +ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is +returned. Following these checks, for all ioctl commands except +AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and +AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is +not a valid descriptor or doesn't correspond to an autofs mount point +an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is +returned. + + +The ioctls +========== + +An example of an implementation which uses this interface can be seen +in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the +distribution tar available for download from kernel.org in directory +/pub/linux/daemons/autofs/v5. + +The device node ioctl operations implemented by this interface are: + + +AUTOFS_DEV_IOCTL_VERSION +------------------------ + +Get the major and minor version of the autofs device ioctl kernel module +implementation. It requires an initialized struct autofs_dev_ioctl as an +input parameter and sets the version information in the passed in structure. +It returns 0 on success or the error -EINVAL if a version mismatch is +detected. + + +AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD +------------------------------------------------------------------ + +Get the major and minor version of the autofs protocol version understood +by loaded module. This call requires an initialized struct autofs_dev_ioctl +with the ioctlfd field set to a valid autofs mount point descriptor +and sets the requested version number in version field of struct args_protover +or sub_version field of struct args_protosubver. These commands return +0 on success or one of the negative error codes if validation fails. + + +AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT +---------------------------------------------------------- + +Obtain and release a file descriptor for an autofs managed mount point +path. The open call requires an initialized struct autofs_dev_ioctl with +the path field set and the size field adjusted appropriately as well +as the devid field of struct args_openmount set to the device number of +the autofs mount. The device number can be obtained from the mount options +shown in /proc/mounts. The close call requires an initialized struct +autofs_dev_ioct with the ioctlfd field set to the descriptor obtained +from the open call. The release of the file descriptor can also be done +with close(2) so any open descriptors will also be closed at process exit. +The close call is included in the implemented operations largely for +completeness and to provide for a consistent user space implementation. + + +AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD +-------------------------------------------------------- + +Return mount and expire result status from user space to the kernel. +Both of these calls require an initialized struct autofs_dev_ioctl +with the ioctlfd field set to the descriptor obtained from the open +call and the token field of struct args_ready or struct args_fail set +to the wait queue token number, received by user space in the foregoing +mount or expire request. The status field of struct args_fail is set to +the errno of the operation. It is set to 0 on success. + + +AUTOFS_DEV_IOCTL_SETPIPEFD_CMD +------------------------------ + +Set the pipe file descriptor used for kernel communication to the daemon. +Normally this is set at mount time using an option but when reconnecting +to a existing mount we need to use this to tell the autofs mount about +the new kernel pipe descriptor. In order to protect mounts against +incorrectly setting the pipe descriptor we also require that the autofs +mount be catatonic (see next call). + +The call requires an initialized struct autofs_dev_ioctl with the +ioctlfd field set to the descriptor obtained from the open call and +the pipefd field of struct args_setpipefd set to descriptor of the pipe. +On success the call also sets the process group id used to identify the +controlling process (eg. the owning automount(8) daemon) to the process +group of the caller. + + +AUTOFS_DEV_IOCTL_CATATONIC_CMD +------------------------------ + +Make the autofs mount point catatonic. The autofs mount will no longer +issue mount requests, the kernel communication pipe descriptor is released +and any remaining waits in the queue released. + +The call requires an initialized struct autofs_dev_ioctl with the +ioctlfd field set to the descriptor obtained from the open call. + + +AUTOFS_DEV_IOCTL_TIMEOUT_CMD +---------------------------- + +Set the expire timeout for mounts within an autofs mount point. + +The call requires an initialized struct autofs_dev_ioctl with the +ioctlfd field set to the descriptor obtained from the open call. + + +AUTOFS_DEV_IOCTL_REQUESTER_CMD +------------------------------ + +Return the uid and gid of the last process to successfully trigger a the +mount on the given path dentry. + +The call requires an initialized struct autofs_dev_ioctl with the path +field set to the mount point in question and the size field adjusted +appropriately. Upon return the uid field of struct args_requester contains +the uid and gid field the gid. + +When reconstructing an autofs mount tree with active mounts we need to +re-connect to mounts that may have used the original process uid and +gid (or string variations of them) for mount lookups within the map entry. +This call provides the ability to obtain this uid and gid so they may be +used by user space for the mount map lookups. + + +AUTOFS_DEV_IOCTL_EXPIRE_CMD +--------------------------- + +Issue an expire request to the kernel for an autofs mount. Typically +this ioctl is called until no further expire candidates are found. + +The call requires an initialized struct autofs_dev_ioctl with the +ioctlfd field set to the descriptor obtained from the open call. In +addition an immediate expire, independent of the mount timeout, can be +requested by setting the how field of struct args_expire to 1. If no +expire candidates can be found the ioctl returns -1 with errno set to +EAGAIN. + +This call causes the kernel module to check the mount corresponding +to the given ioctlfd for mounts that can be expired, issues an expire +request back to the daemon and waits for completion. + +AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD +------------------------------ + +Checks if an autofs mount point is in use. + +The call requires an initialized struct autofs_dev_ioctl with the +ioctlfd field set to the descriptor obtained from the open call and +it returns the result in the may_umount field of struct args_askumount, +1 for busy and 0 otherwise. + + +AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD +--------------------------------- + +Check if the given path is a mountpoint. + +The call requires an initialized struct autofs_dev_ioctl. There are two +possible variations. Both use the path field set to the path of the mount +point to check and the size field adjusted appropriately. One uses the +ioctlfd field to identify a specific mount point to check while the other +variation uses the path and optionally in.type field of struct args_ismountpoint +set to an autofs mount type. The call returns 1 if this is a mount point +and sets out.devid field to the device number of the mount and out.magic +field to the relevant super block magic number (described below) or 0 if +it isn't a mountpoint. In both cases the the device number (as returned +by new_encode_dev()) is returned in out.devid field. + +If supplied with a file descriptor we're looking for a specific mount, +not necessarily at the top of the mounted stack. In this case the path +the descriptor corresponds to is considered a mountpoint if it is itself +a mountpoint or contains a mount, such as a multi-mount without a root +mount. In this case we return 1 if the descriptor corresponds to a mount +point and and also returns the super magic of the covering mount if there +is one or 0 if it isn't a mountpoint. + +If a path is supplied (and the ioctlfd field is set to -1) then the path +is looked up and is checked to see if it is the root of a mount. If a +type is also given we are looking for a particular autofs mount and if +a match isn't found a fail is returned. If the the located path is the +root of a mount 1 is returned along with the super magic of the mount +or 0 otherwise. diff --git a/Documentation/filesystems/autofs.txt b/Documentation/filesystems/autofs.txt new file mode 100644 index 0000000000000000000000000000000000000000..373ad25852d35bf40de7bbdb00a77f186e82e128 --- /dev/null +++ b/Documentation/filesystems/autofs.txt @@ -0,0 +1,529 @@ + + + + +autofs - how it works +===================== + +Purpose +------- + +The goal of autofs is to provide on-demand mounting and race free +automatic unmounting of various other filesystems. This provides two +key advantages: + +1. There is no need to delay boot until all filesystems that + might be needed are mounted. Processes that try to access those + slow filesystems might be delayed but other processes can + continue freely. This is particularly important for + network filesystems (e.g. NFS) or filesystems stored on + media with a media-changing robot. + +2. The names and locations of filesystems can be stored in + a remote database and can change at any time. The content + in that data base at the time of access will be used to provide + a target for the access. The interpretation of names in the + filesystem can even be programmatic rather than database-backed, + allowing wildcards for example, and can vary based on the user who + first accessed a name. + +Context +------- + +The "autofs" filesystem module is only one part of an autofs system. +There also needs to be a user-space program which looks up names +and mounts filesystems. This will often be the "automount" program, +though other tools including "systemd" can make use of "autofs". +This document describes only the kernel module and the interactions +required with any user-space program. Subsequent text refers to this +as the "automount daemon" or simply "the daemon". + +"autofs" is a Linux kernel module with provides the "autofs" +filesystem type. Several "autofs" filesystems can be mounted and they +can each be managed separately, or all managed by the same daemon. + +Content +------- + +An autofs filesystem can contain 3 sorts of objects: directories, +symbolic links and mount traps. Mount traps are directories with +extra properties as described in the next section. + +Objects can only be created by the automount daemon: symlinks are +created with a regular `symlink` system call, while directories and +mount traps are created with `mkdir`. The determination of whether a +directory should be a mount trap or not is quite _ad hoc_, largely for +historical reasons, and is determined in part by the +*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option. + +If neither the *direct* or *offset* mount options are given (so the +mount is considered to be *indirect*), then the root directory is +always a regular directory, otherwise it is a mount trap when it is +empty and a regular directory when not empty. Note that *direct* and +*offset* are treated identically so a concise summary is that the root +directory is a mount trap only if the filesystem is mounted *direct* +and the root is empty. + +Directories created in the root directory are mount traps only if the +filesystem is mounted *indirect* and they are empty. + +Directories further down the tree depend on the *maxproto* mount +option and particularly whether it is less than five or not. +When *maxproto* is five, no directories further down the +tree are ever mount traps, they are always regular directories. When +the *maxproto* is four (or three), these directories are mount traps +precisely when they are empty. + +So: non-empty (i.e. non-leaf) directories are never mount traps. Empty +directories are sometimes mount traps, and sometimes not depending on +where in the tree they are (root, top level, or lower), the *maxproto*, +and whether the mount was *indirect* or not. + +Mount Traps +--------------- + +A core element of the implementation of autofs is the Mount Traps +which are provided by the Linux VFS. Any directory provided by a +filesystem can be designated as a trap. This involves two separate +features that work together to allow autofs to do its job. + +**DCACHE_NEED_AUTOMOUNT** + +If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if +the inode has S_AUTOMOUNT set, or can be set directly) then it is +(potentially) a mount trap. Any access to this directory beyond a +"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation +to be called. The task of this method is to find the filesystem that +should be mounted on the directory and to return it. The VFS is +responsible for actually mounting the root of this filesystem on the +directory. + +autofs doesn't find the filesystem itself but sends a message to the +automount daemon asking it to find and mount the filesystem. The +autofs `d_automount` method then waits for the daemon to report that +everything is ready. It will then return "`NULL`" indicating that the +mount has already happened. The VFS doesn't try to mount anything but +follows down the mount that is already there. + +This functionality is sufficient for some users of mount traps such +as NFS which creates traps so that mountpoints on the server can be +reflected on the client. However it is not sufficient for autofs. As +mounting onto a directory is considered to be "beyond a `stat`", the +automount daemon would not be able to mount a filesystem on the 'trap' +directory without some way to avoid getting caught in the trap. For +that purpose there is another flag. + +**DCACHE_MANAGE_TRANSIT** + +If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but +related behaviors are invoked, both using the `d_op->d_manage()` +dentry operation. + +Firstly, before checking to see if any filesystem is mounted on the +directory, d_manage() will be called with the `rcu_walk` parameter set +to `false`. It may return one of three things: + +- A return value of zero indicates that there is nothing special + about this dentry and normal checks for mounts and automounts + should proceed. + + autofs normally returns zero, but first waits for any + expiry (automatic unmounting of the mounted filesystem) to + complete. This avoids races. + +- A return value of `-EISDIR` tells the VFS to ignore any mounts + on the directory and to not consider calling `->d_automount()`. + This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag + causing the directory not be a mount trap after all. + + autofs returns this if it detects that the process performing the + lookup is the automount daemon and that the mount has been + requested but has not yet completed. How it determines this is + discussed later. This allows the automount daemon not to get + caught in the mount trap. + + There is a subtlety here. It is possible that a second autofs + filesystem can be mounted below the first and for both of them to + be managed by the same daemon. For the daemon to be able to mount + something on the second it must be able to "walk" down past the + first. This means that d_manage cannot *always* return -EISDIR for + the automount daemon. It must only return it when a mount has + been requested, but has not yet completed. + + `d_manage` also returns `-EISDIR` if the dentry shouldn't be a + mount trap, either because it is a symbolic link or because it is + not empty. + +- Any other negative value is treated as an error and returned + to the caller. + + autofs can return + + - -ENOENT if the automount daemon failed to mount anything, + - -ENOMEM if it ran out of memory, + - -EINTR if a signal arrived while waiting for expiry to + complete + - or any other error sent down by the automount daemon. + + +The second use case only occurs during an "RCU-walk" and so `rcu_walk` +will be set. + +An RCU-walk is a fast and lightweight process for walking down a +filename path (i.e. it is like running on tip-toes). RCU-walk cannot +cope with all situations so when it finds a difficulty it falls back +to "REF-walk", which is slower but more robust. + +RCU-walk will never call `->d_automount`; the filesystems must already +be mounted or RCU-walk cannot handle the path. +To determine if a mount-trap is safe for RCU-walk mode it calls +`->d_manage()` with `rcu_walk` set to `true`. + +In this case `d_manage()` must avoid blocking and should avoid taking +spinlocks if at all possible. Its sole purpose is to determine if it +would be safe to follow down into any mounted directory and the only +reason that it might not be is if an expiry of the mount is +underway. + +In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the +VFS that this is a directory that doesn't require d_automount. If +`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing +mounted, it *will* fall back to REF-walk. `d_manage()` cannot make the +VFS remain in RCU-walk mode, but can only tell it to get out of +RCU-walk mode by returning `-ECHILD`. + +So `d_manage()`, when called with `rcu_walk` set, should either return +-ECHILD if there is any reason to believe it is unsafe to end the +mounted filesystem, and otherwise should return 0. + +autofs will return `-ECHILD` if an expiry of the filesystem has been +initiated or is being considered, otherwise it returns 0. + + +Mountpoint expiry +----------------- + +The VFS has a mechanism for automatically expiring unused mounts, +much as it can expire any unused dentry information from the dcache. +This is guided by the MNT_SHRINKABLE flag. This only applies to +mounts that were created by `d_automount()` returning a filesystem to be +mounted. As autofs doesn't return such a filesystem but leaves the +mounting to the automount daemon, it must involve the automount daemon +in unmounting as well. This also means that autofs has more control +of expiry. + +The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to +the `umount` system call. Unmounting with MNT_EXPIRE will fail unless +a previous attempt had been made, and the filesystem has been inactive +and untouched since that previous attempt. autofs does not depend on +this but has its own internal tracking of whether filesystems were +recently used. This allows individual names in the autofs directory +to expire separately. + +With version 4 of the protocol, the automount daemon can try to +unmount any filesystems mounted on the autofs filesystem or remove any +symbolic links or empty directories any time it likes. If the unmount +or removal is successful the filesystem will be returned to the state +it was before the mount or creation, so that any access of the name +will trigger normal auto-mount processing. In particlar, `rmdir` and +`unlink` do not leave negative entries in the dcache as a normal +filesystem would, so an attempt to access a recently-removed object is +passed to autofs for handling. + +With version 5, this is not safe except for unmounting from top-level +directories. As lower-level directories are never mount traps, other +processes will see an empty directory as soon as the filesystem is +unmounted. So it is generally safest to use the autofs expiry +protocol described below. + +Normally the daemon only wants to remove entries which haven't been +used for a while. For this purpose autofs maintains a "`last_used`" +time stamp on each directory or symlink. For symlinks it genuinely +does record the last time the symlink was "used" or followed to find +out where it points to. For directories the field is a slight +misnomer. It actually records the last time that autofs checked if +the directory or one of its descendents was busy and found that it +was. This is just as useful and doesn't require updating the field so +often. + +The daemon is able to ask autofs if anything is due to be expired, +using an `ioctl` as discussed later. For a *direct* mount, autofs +considers if the entire mount-tree can be unmounted or not. For an +*indirect* mount, autofs considers each of the names in the top level +directory to determine if any of those can be unmounted and cleaned +up. + +There is an option with indirect mounts to consider each of the leaves +that has been mounted on instead of considering the top-level names. +This is intended for compatability with version 4 of autofs and should +be considered as deprecated. + +When autofs considers a directory it checks the `last_used` time and +compares it with the "timeout" value set when the filesystem was +mounted, though this check is ignored in some cases. It also checks if +the directory or anything below it is in use. For symbolic links, +only the `last_used` time is ever considered. + +If both appear to support expiring the directory or symlink, an action +is taken. + +There are two ways to ask autofs to consider expiry. The first is to +use the **AUTOFS_IOC_EXPIRE** ioctl. This only works for indirect +mounts. If it finds something in the root directory to expire it will +return the name of that thing. Once a name has been returned the +automount daemon needs to unmount any filesystems mounted below the +name normally. As described above, this is unsafe for non-toplevel +mounts in a version-5 autofs. For this reason the current `automountd` +does not use this ioctl. + +The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or +the **AUTOFS_IOC_EXPIRE_MULTI** ioctl. This will work for both direct and +indirect mounts. If it selects an object to expire, it will notify +the daemon using the notification mechanism described below. This +will block until the daemon acknowledges the expiry notification. +This implies that the "`EXPIRE`" ioctl must be sent from a different +thread than the one which handles notification. + +While the ioctl is blocking, the entry is marked as "expiring" and +`d_manage` will block until the daemon affirms that the unmount has +completed (together with removing any directories that might have been +necessary), or has been aborted. + +Communicating with autofs: detecting the daemon +----------------------------------------------- + +There are several forms of communication between the automount daemon +and the filesystem. As we have already seen, the daemon can create and +remove directories and symlinks using normal filesystem operations. +autofs knows whether a process requesting some operation is the daemon +or not based on its process-group id number (see getpgid(1)). + +When an autofs filesystem is mounted the pgid of the mounting +processes is recorded unless the "pgrp=" option is given, in which +case that number is recorded instead. Any request arriving from a +process in that process group is considered to come from the daemon. +If the daemon ever has to be stopped and restarted a new pgid can be +provided through an ioctl as will be described below. + +Communicating with autofs: the event pipe +----------------------------------------- + +When an autofs filesystem is mounted, the 'write' end of a pipe must +be passed using the 'fd=' mount option. autofs will write +notification messages to this pipe for the daemon to respond to. +For version 5, the format of the message is: + + struct autofs_v5_packet { + int proto_version; /* Protocol version */ + int type; /* Type of packet */ + autofs_wqt_t wait_queue_token; + __u32 dev; + __u64 ino; + __u32 uid; + __u32 gid; + __u32 pid; + __u32 tgid; + __u32 len; + char name[NAME_MAX+1]; + }; + +where the type is one of + + autofs_ptype_missing_indirect + autofs_ptype_expire_indirect + autofs_ptype_missing_direct + autofs_ptype_expire_direct + +so messages can indicate that a name is missing (something tried to +access it but it isn't there) or that it has been selected for expiry. + +The pipe will be set to "packet mode" (equivalent to passing +`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at +most one packet, and any unread portion of a packet will be discarded. + +The `wait_queue_token` is a unique number which can identify a +particular request to be acknowledged. When a message is sent over +the pipe the affected dentry is marked as either "active" or +"expiring" and other accesses to it block until the message is +acknowledged using one of the ioctls below and the relevant +`wait_queue_token`. + +Communicating with autofs: root directory ioctls +------------------------------------------------ + +The root directory of an autofs filesystem will respond to a number of +ioctls. The process issuing the ioctl must have the CAP_SYS_ADMIN +capability, or must be the automount daemon. + +The available ioctl commands are: + +- **AUTOFS_IOC_READY**: a notification has been handled. The argument + to the ioctl command is the "wait_queue_token" number + corresponding to the notification being acknowledged. +- **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with + the error code `ENOENT`. +- **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic" + mode meaning that it stops sending notifications to the daemon. + This mode is also entered if a write to the pipe fails. +- **AUTOFS_IOC_PROTOVER**: This returns the protocol version in use. +- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which + is really a version number for the implementation. It is + currently 2. +- **AUTOFS_IOC_SETTIMEOUT**: This passes a pointer to an unsigned + long. The value is used to set the timeout for expiry, and + the current timeout value is stored back through the pointer. +- **AUTOFS_IOC_ASKUMOUNT**: Returns, in the pointed-to `int`, 1 if + the filesystem could be unmounted. This is only a hint as + the situation could change at any instant. This call can be + use to avoid a more expensive full unmount attempt. +- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is + anything suitable to expire. A pointer to a packet: + + struct autofs_packet_expire_multi { + int proto_version; /* Protocol version */ + int type; /* Type of packet */ + autofs_wqt_t wait_queue_token; + int len; + char name[NAME_MAX+1]; + }; + + is required. This is filled in with the name of something + that can be unmounted or removed. If nothing can be expired, + `errno` is set to `EAGAIN`. Even though a `wait_queue_token` + is present in the structure, no "wait queue" is established + and no acknowledgment is needed. +- **AUTOFS_IOC_EXPIRE_MULTI**: This is similar to + **AUTOFS_IOC_EXPIRE** except that it causes notification to be + sent to the daemon, and it blocks until the daemon acknowledges. + The argument is an integer which can contain two different flags. + + **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored + and objects are expired if the are not in use. + + **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level + name to expire. This is only safe when *maxproto* is 4. + +Communicating with autofs: char-device ioctls +--------------------------------------------- + +It is not always possible to open the root of an autofs filesystem, +particularly a *direct* mounted filesystem. If the automount daemon +is restarted there is no way for it to regain control of existing +mounts using any of the above communication channels. To address this +need there is a "miscellaneous" character device (major 10, minor 235) +which can be used to communicate directly with the autofs filesystem. +It requires CAP_SYS_ADMIN for access. + +The `ioctl`s that can be used on this device are described in a separate +document `autofs-mount-control.txt`, and are summarized briefly here. +Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure: + + struct autofs_dev_ioctl { + __u32 ver_major; + __u32 ver_minor; + __u32 size; /* total size of data passed in + * including this struct */ + __s32 ioctlfd; /* automount command fd */ + + /* Command parameters */ + union { + struct args_protover protover; + struct args_protosubver protosubver; + struct args_openmount openmount; + struct args_ready ready; + struct args_fail fail; + struct args_setpipefd setpipefd; + struct args_timeout timeout; + struct args_requester requester; + struct args_expire expire; + struct args_askumount askumount; + struct args_ismountpoint ismountpoint; + }; + + char path[0]; + }; + +For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target +filesystem is identified by the `path`. All other commands identify +the filesystem by the `ioctlfd` which is a file descriptor open on the +root, and which can be returned by **OPEN_MOUNT**. + +The `ver_major` and `ver_minor` are in/out parameters which check that +the requested version is supported, and report the maximum version +that the kernel module can support. + +Commands are: + +- **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and + set version numbers. +- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor + on the root of an autofs filesystem. The filesystem is identified + by name and device number, which is stored in `openmount.devid`. + Device numbers for existing filesystems can be found in + `/proc/self/mountinfo`. +- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`. +- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the filesystem is in + catatonic mode, this can provide the write end of a new pipe + in `setpipefd.pipefd` to re-establish communication with a daemon. + The process group of the calling process is used to identify the + daemon. +- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a + name within the filesystem that has been auto-mounted on. + On successful return, `requester.uid` and `requester.gid` will be + the UID and GID of the process which triggered that mount. +- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a + mountpoint of a particular type - see separate documentation for + details. +- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**: +- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**: +- **AUTOFS_DEV_IOCTL_READY_CMD**: +- **AUTOFS_DEV_IOCTL_FAIL_CMD**: +- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**: +- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**: +- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**: +- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**: These all have the same + function as the similarly named **AUTOFS_IOC** ioctls, except + that **FAIL** can be given an explicit error number in `fail.status` + instead of assuming `ENOENT`, and this **EXPIRE** command + corresponds to **AUTOFS_IOC_EXPIRE_MULTI**. + +Catatonic mode +-------------- + +As mentioned, an autofs mount can enter "catatonic" mode. This +happens if a write to the notification pipe fails, or if it is +explicitly requested by an `ioctl`. + +When entering catatonic mode, the pipe is closed and any pending +notifications are acknowledged with the error `ENOENT`. + +Once in catatonic mode attempts to access non-existing names will +result in `ENOENT` while attempts to access existing directories will +be treated in the same way as if they came from the daemon, so mount +traps will not fire. + +When the filesystem is mounted a _uid_ and _gid_ can be given which +set the ownership of directories and symbolic links. When the +filesystem is in catatonic mode, any process with a matching UID can +create directories or symlinks in the root directory, but not in other +directories. + +Catatonic mode can only be left via the +**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`. + +autofs, name spaces, and shared mounts +-------------------------------------- + +With bind mounts and name spaces it is possible for an autofs +filesystem to appear at multiple places in one or more filesystem +name spaces. For this to work sensibly, the autofs filesystem should +always be mounted "shared". e.g. + +> `mount --make-shared /autofs/mount/point` + +The automount daemon is only able to manage a single mount location for +an autofs filesystem and if mounts on that are not 'shared', other +locations will not behave as expected. In particular access to those +other locations will likely result in the `ELOOP` error + +> Too many levels of symbolic links diff --git a/Documentation/filesystems/autofs4-mount-control.txt b/Documentation/filesystems/autofs4-mount-control.txt deleted file mode 100644 index e5177cb31a040be01b5bf169e9e06d513596c64a..0000000000000000000000000000000000000000 --- a/Documentation/filesystems/autofs4-mount-control.txt +++ /dev/null @@ -1,407 +0,0 @@ - -Miscellaneous Device control operations for the autofs4 kernel module -==================================================================== - -The problem -=========== - -There is a problem with active restarts in autofs (that is to say -restarting autofs when there are busy mounts). - -During normal operation autofs uses a file descriptor opened on the -directory that is being managed in order to be able to issue control -operations. Using a file descriptor gives ioctl operations access to -autofs specific information stored in the super block. The operations -are things such as setting an autofs mount catatonic, setting the -expire timeout and requesting expire checks. As is explained below, -certain types of autofs triggered mounts can end up covering an autofs -mount itself which prevents us being able to use open(2) to obtain a -file descriptor for these operations if we don't already have one open. - -Currently autofs uses "umount -l" (lazy umount) to clear active mounts -at restart. While using lazy umount works for most cases, anything that -needs to walk back up the mount tree to construct a path, such as -getcwd(2) and the proc file system /proc//cwd, no longer works -because the point from which the path is constructed has been detached -from the mount tree. - -The actual problem with autofs is that it can't reconnect to existing -mounts. Immediately one thinks of just adding the ability to remount -autofs file systems would solve it, but alas, that can't work. This is -because autofs direct mounts and the implementation of "on demand mount -and expire" of nested mount trees have the file system mounted directly -on top of the mount trigger directory dentry. - -For example, there are two types of automount maps, direct (in the kernel -module source you will see a third type called an offset, which is just -a direct mount in disguise) and indirect. - -Here is a master map with direct and indirect map entries: - -/- /etc/auto.direct -/test /etc/auto.indirect - -and the corresponding map files: - -/etc/auto.direct: - -/automount/dparse/g6 budgie:/autofs/export1 -/automount/dparse/g1 shark:/autofs/export1 -and so on. - -/etc/auto.indirect: - -g1 shark:/autofs/export1 -g6 budgie:/autofs/export1 -and so on. - -For the above indirect map an autofs file system is mounted on /test and -mounts are triggered for each sub-directory key by the inode lookup -operation. So we see a mount of shark:/autofs/export1 on /test/g1, for -example. - -The way that direct mounts are handled is by making an autofs mount on -each full path, such as /automount/dparse/g1, and using it as a mount -trigger. So when we walk on the path we mount shark:/autofs/export1 "on -top of this mount point". Since these are always directories we can -use the follow_link inode operation to trigger the mount. - -But, each entry in direct and indirect maps can have offsets (making -them multi-mount map entries). - -For example, an indirect mount map entry could also be: - -g1 \ - / shark:/autofs/export5/testing/test \ - /s1 shark:/autofs/export/testing/test/s1 \ - /s2 shark:/autofs/export5/testing/test/s2 \ - /s1/ss1 shark:/autofs/export1 \ - /s2/ss2 shark:/autofs/export2 - -and a similarly a direct mount map entry could also be: - -/automount/dparse/g1 \ - / shark:/autofs/export5/testing/test \ - /s1 shark:/autofs/export/testing/test/s1 \ - /s2 shark:/autofs/export5/testing/test/s2 \ - /s1/ss1 shark:/autofs/export2 \ - /s2/ss2 shark:/autofs/export2 - -One of the issues with version 4 of autofs was that, when mounting an -entry with a large number of offsets, possibly with nesting, we needed -to mount and umount all of the offsets as a single unit. Not really a -problem, except for people with a large number of offsets in map entries. -This mechanism is used for the well known "hosts" map and we have seen -cases (in 2.4) where the available number of mounts are exhausted or -where the number of privileged ports available is exhausted. - -In version 5 we mount only as we go down the tree of offsets and -similarly for expiring them which resolves the above problem. There is -somewhat more detail to the implementation but it isn't needed for the -sake of the problem explanation. The one important detail is that these -offsets are implemented using the same mechanism as the direct mounts -above and so the mount points can be covered by a mount. - -The current autofs implementation uses an ioctl file descriptor opened -on the mount point for control operations. The references held by the -descriptor are accounted for in checks made to determine if a mount is -in use and is also used to access autofs file system information held -in the mount super block. So the use of a file handle needs to be -retained. - - -The Solution -============ - -To be able to restart autofs leaving existing direct, indirect and -offset mounts in place we need to be able to obtain a file handle -for these potentially covered autofs mount points. Rather than just -implement an isolated operation it was decided to re-implement the -existing ioctl interface and add new operations to provide this -functionality. - -In addition, to be able to reconstruct a mount tree that has busy mounts, -the uid and gid of the last user that triggered the mount needs to be -available because these can be used as macro substitution variables in -autofs maps. They are recorded at mount request time and an operation -has been added to retrieve them. - -Since we're re-implementing the control interface, a couple of other -problems with the existing interface have been addressed. First, when -a mount or expire operation completes a status is returned to the -kernel by either a "send ready" or a "send fail" operation. The -"send fail" operation of the ioctl interface could only ever send -ENOENT so the re-implementation allows user space to send an actual -status. Another expensive operation in user space, for those using -very large maps, is discovering if a mount is present. Usually this -involves scanning /proc/mounts and since it needs to be done quite -often it can introduce significant overhead when there are many entries -in the mount table. An operation to lookup the mount status of a mount -point dentry (covered or not) has also been added. - -Current kernel development policy recommends avoiding the use of the -ioctl mechanism in favor of systems such as Netlink. An implementation -using this system was attempted to evaluate its suitability and it was -found to be inadequate, in this case. The Generic Netlink system was -used for this as raw Netlink would lead to a significant increase in -complexity. There's no question that the Generic Netlink system is an -elegant solution for common case ioctl functions but it's not a complete -replacement probably because its primary purpose in life is to be a -message bus implementation rather than specifically an ioctl replacement. -While it would be possible to work around this there is one concern -that lead to the decision to not use it. This is that the autofs -expire in the daemon has become far to complex because umount -candidates are enumerated, almost for no other reason than to "count" -the number of times to call the expire ioctl. This involves scanning -the mount table which has proved to be a big overhead for users with -large maps. The best way to improve this is try and get back to the -way the expire was done long ago. That is, when an expire request is -issued for a mount (file handle) we should continually call back to -the daemon until we can't umount any more mounts, then return the -appropriate status to the daemon. At the moment we just expire one -mount at a time. A Generic Netlink implementation would exclude this -possibility for future development due to the requirements of the -message bus architecture. - - -autofs4 Miscellaneous Device mount control interface -==================================================== - -The control interface is opening a device node, typically /dev/autofs. - -All the ioctls use a common structure to pass the needed parameter -information and return operation results: - -struct autofs_dev_ioctl { - __u32 ver_major; - __u32 ver_minor; - __u32 size; /* total size of data passed in - * including this struct */ - __s32 ioctlfd; /* automount command fd */ - - /* Command parameters */ - union { - struct args_protover protover; - struct args_protosubver protosubver; - struct args_openmount openmount; - struct args_ready ready; - struct args_fail fail; - struct args_setpipefd setpipefd; - struct args_timeout timeout; - struct args_requester requester; - struct args_expire expire; - struct args_askumount askumount; - struct args_ismountpoint ismountpoint; - }; - - char path[0]; -}; - -The ioctlfd field is a mount point file descriptor of an autofs mount -point. It is returned by the open call and is used by all calls except -the check for whether a given path is a mount point, where it may -optionally be used to check a specific mount corresponding to a given -mount point file descriptor, and when requesting the uid and gid of the -last successful mount on a directory within the autofs file system. - -The union is used to communicate parameters and results of calls made -as described below. - -The path field is used to pass a path where it is needed and the size field -is used account for the increased structure length when translating the -structure sent from user space. - -This structure can be initialized before setting specific fields by using -the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *). - -All of the ioctls perform a copy of this structure from user space to -kernel space and return -EINVAL if the size parameter is smaller than -the structure size itself, -ENOMEM if the kernel memory allocation fails -or -EFAULT if the copy itself fails. Other checks include a version check -of the compiled in user space version against the module version and a -mismatch results in a -EINVAL return. If the size field is greater than -the structure size then a path is assumed to be present and is checked to -ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is -returned. Following these checks, for all ioctl commands except -AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and -AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is -not a valid descriptor or doesn't correspond to an autofs mount point -an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is -returned. - - -The ioctls -========== - -An example of an implementation which uses this interface can be seen -in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the -distribution tar available for download from kernel.org in directory -/pub/linux/daemons/autofs/v5. - -The device node ioctl operations implemented by this interface are: - - -AUTOFS_DEV_IOCTL_VERSION ------------------------- - -Get the major and minor version of the autofs4 device ioctl kernel module -implementation. It requires an initialized struct autofs_dev_ioctl as an -input parameter and sets the version information in the passed in structure. -It returns 0 on success or the error -EINVAL if a version mismatch is -detected. - - -AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD ------------------------------------------------------------------- - -Get the major and minor version of the autofs4 protocol version understood -by loaded module. This call requires an initialized struct autofs_dev_ioctl -with the ioctlfd field set to a valid autofs mount point descriptor -and sets the requested version number in version field of struct args_protover -or sub_version field of struct args_protosubver. These commands return -0 on success or one of the negative error codes if validation fails. - - -AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT ----------------------------------------------------------- - -Obtain and release a file descriptor for an autofs managed mount point -path. The open call requires an initialized struct autofs_dev_ioctl with -the path field set and the size field adjusted appropriately as well -as the devid field of struct args_openmount set to the device number of -the autofs mount. The device number can be obtained from the mount options -shown in /proc/mounts. The close call requires an initialized struct -autofs_dev_ioct with the ioctlfd field set to the descriptor obtained -from the open call. The release of the file descriptor can also be done -with close(2) so any open descriptors will also be closed at process exit. -The close call is included in the implemented operations largely for -completeness and to provide for a consistent user space implementation. - - -AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD --------------------------------------------------------- - -Return mount and expire result status from user space to the kernel. -Both of these calls require an initialized struct autofs_dev_ioctl -with the ioctlfd field set to the descriptor obtained from the open -call and the token field of struct args_ready or struct args_fail set -to the wait queue token number, received by user space in the foregoing -mount or expire request. The status field of struct args_fail is set to -the errno of the operation. It is set to 0 on success. - - -AUTOFS_DEV_IOCTL_SETPIPEFD_CMD ------------------------------- - -Set the pipe file descriptor used for kernel communication to the daemon. -Normally this is set at mount time using an option but when reconnecting -to a existing mount we need to use this to tell the autofs mount about -the new kernel pipe descriptor. In order to protect mounts against -incorrectly setting the pipe descriptor we also require that the autofs -mount be catatonic (see next call). - -The call requires an initialized struct autofs_dev_ioctl with the -ioctlfd field set to the descriptor obtained from the open call and -the pipefd field of struct args_setpipefd set to descriptor of the pipe. -On success the call also sets the process group id used to identify the -controlling process (eg. the owning automount(8) daemon) to the process -group of the caller. - - -AUTOFS_DEV_IOCTL_CATATONIC_CMD ------------------------------- - -Make the autofs mount point catatonic. The autofs mount will no longer -issue mount requests, the kernel communication pipe descriptor is released -and any remaining waits in the queue released. - -The call requires an initialized struct autofs_dev_ioctl with the -ioctlfd field set to the descriptor obtained from the open call. - - -AUTOFS_DEV_IOCTL_TIMEOUT_CMD ----------------------------- - -Set the expire timeout for mounts within an autofs mount point. - -The call requires an initialized struct autofs_dev_ioctl with the -ioctlfd field set to the descriptor obtained from the open call. - - -AUTOFS_DEV_IOCTL_REQUESTER_CMD ------------------------------- - -Return the uid and gid of the last process to successfully trigger a the -mount on the given path dentry. - -The call requires an initialized struct autofs_dev_ioctl with the path -field set to the mount point in question and the size field adjusted -appropriately. Upon return the uid field of struct args_requester contains -the uid and gid field the gid. - -When reconstructing an autofs mount tree with active mounts we need to -re-connect to mounts that may have used the original process uid and -gid (or string variations of them) for mount lookups within the map entry. -This call provides the ability to obtain this uid and gid so they may be -used by user space for the mount map lookups. - - -AUTOFS_DEV_IOCTL_EXPIRE_CMD ---------------------------- - -Issue an expire request to the kernel for an autofs mount. Typically -this ioctl is called until no further expire candidates are found. - -The call requires an initialized struct autofs_dev_ioctl with the -ioctlfd field set to the descriptor obtained from the open call. In -addition an immediate expire, independent of the mount timeout, can be -requested by setting the how field of struct args_expire to 1. If no -expire candidates can be found the ioctl returns -1 with errno set to -EAGAIN. - -This call causes the kernel module to check the mount corresponding -to the given ioctlfd for mounts that can be expired, issues an expire -request back to the daemon and waits for completion. - -AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD ------------------------------- - -Checks if an autofs mount point is in use. - -The call requires an initialized struct autofs_dev_ioctl with the -ioctlfd field set to the descriptor obtained from the open call and -it returns the result in the may_umount field of struct args_askumount, -1 for busy and 0 otherwise. - - -AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD ---------------------------------- - -Check if the given path is a mountpoint. - -The call requires an initialized struct autofs_dev_ioctl. There are two -possible variations. Both use the path field set to the path of the mount -point to check and the size field adjusted appropriately. One uses the -ioctlfd field to identify a specific mount point to check while the other -variation uses the path and optionally in.type field of struct args_ismountpoint -set to an autofs mount type. The call returns 1 if this is a mount point -and sets out.devid field to the device number of the mount and out.magic -field to the relevant super block magic number (described below) or 0 if -it isn't a mountpoint. In both cases the the device number (as returned -by new_encode_dev()) is returned in out.devid field. - -If supplied with a file descriptor we're looking for a specific mount, -not necessarily at the top of the mounted stack. In this case the path -the descriptor corresponds to is considered a mountpoint if it is itself -a mountpoint or contains a mount, such as a multi-mount without a root -mount. In this case we return 1 if the descriptor corresponds to a mount -point and and also returns the super magic of the covering mount if there -is one or 0 if it isn't a mountpoint. - -If a path is supplied (and the ioctlfd field is set to -1) then the path -is looked up and is checked to see if it is the root of a mount. If a -type is also given we are looking for a particular autofs mount and if -a match isn't found a fail is returned. If the the located path is the -root of a mount 1 is returned along with the super magic of the mount -or 0 otherwise. - diff --git a/Documentation/filesystems/autofs4.txt b/Documentation/filesystems/autofs4.txt deleted file mode 100644 index f10dd590f69fe3e9384ebf0b9d9ef87cd91f88e7..0000000000000000000000000000000000000000 --- a/Documentation/filesystems/autofs4.txt +++ /dev/null @@ -1,529 +0,0 @@ - - - - -autofs - how it works -===================== - -Purpose -------- - -The goal of autofs is to provide on-demand mounting and race free -automatic unmounting of various other filesystems. This provides two -key advantages: - -1. There is no need to delay boot until all filesystems that - might be needed are mounted. Processes that try to access those - slow filesystems might be delayed but other processes can - continue freely. This is particularly important for - network filesystems (e.g. NFS) or filesystems stored on - media with a media-changing robot. - -2. The names and locations of filesystems can be stored in - a remote database and can change at any time. The content - in that data base at the time of access will be used to provide - a target for the access. The interpretation of names in the - filesystem can even be programmatic rather than database-backed, - allowing wildcards for example, and can vary based on the user who - first accessed a name. - -Context -------- - -The "autofs4" filesystem module is only one part of an autofs system. -There also needs to be a user-space program which looks up names -and mounts filesystems. This will often be the "automount" program, -though other tools including "systemd" can make use of "autofs4". -This document describes only the kernel module and the interactions -required with any user-space program. Subsequent text refers to this -as the "automount daemon" or simply "the daemon". - -"autofs4" is a Linux kernel module with provides the "autofs" -filesystem type. Several "autofs" filesystems can be mounted and they -can each be managed separately, or all managed by the same daemon. - -Content -------- - -An autofs filesystem can contain 3 sorts of objects: directories, -symbolic links and mount traps. Mount traps are directories with -extra properties as described in the next section. - -Objects can only be created by the automount daemon: symlinks are -created with a regular `symlink` system call, while directories and -mount traps are created with `mkdir`. The determination of whether a -directory should be a mount trap or not is quite _ad hoc_, largely for -historical reasons, and is determined in part by the -*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option. - -If neither the *direct* or *offset* mount options are given (so the -mount is considered to be *indirect*), then the root directory is -always a regular directory, otherwise it is a mount trap when it is -empty and a regular directory when not empty. Note that *direct* and -*offset* are treated identically so a concise summary is that the root -directory is a mount trap only if the filesystem is mounted *direct* -and the root is empty. - -Directories created in the root directory are mount traps only if the -filesystem is mounted *indirect* and they are empty. - -Directories further down the tree depend on the *maxproto* mount -option and particularly whether it is less than five or not. -When *maxproto* is five, no directories further down the -tree are ever mount traps, they are always regular directories. When -the *maxproto* is four (or three), these directories are mount traps -precisely when they are empty. - -So: non-empty (i.e. non-leaf) directories are never mount traps. Empty -directories are sometimes mount traps, and sometimes not depending on -where in the tree they are (root, top level, or lower), the *maxproto*, -and whether the mount was *indirect* or not. - -Mount Traps ---------------- - -A core element of the implementation of autofs is the Mount Traps -which are provided by the Linux VFS. Any directory provided by a -filesystem can be designated as a trap. This involves two separate -features that work together to allow autofs to do its job. - -**DCACHE_NEED_AUTOMOUNT** - -If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if -the inode has S_AUTOMOUNT set, or can be set directly) then it is -(potentially) a mount trap. Any access to this directory beyond a -"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation -to be called. The task of this method is to find the filesystem that -should be mounted on the directory and to return it. The VFS is -responsible for actually mounting the root of this filesystem on the -directory. - -autofs doesn't find the filesystem itself but sends a message to the -automount daemon asking it to find and mount the filesystem. The -autofs `d_automount` method then waits for the daemon to report that -everything is ready. It will then return "`NULL`" indicating that the -mount has already happened. The VFS doesn't try to mount anything but -follows down the mount that is already there. - -This functionality is sufficient for some users of mount traps such -as NFS which creates traps so that mountpoints on the server can be -reflected on the client. However it is not sufficient for autofs. As -mounting onto a directory is considered to be "beyond a `stat`", the -automount daemon would not be able to mount a filesystem on the 'trap' -directory without some way to avoid getting caught in the trap. For -that purpose there is another flag. - -**DCACHE_MANAGE_TRANSIT** - -If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but -related behaviors are invoked, both using the `d_op->d_manage()` -dentry operation. - -Firstly, before checking to see if any filesystem is mounted on the -directory, d_manage() will be called with the `rcu_walk` parameter set -to `false`. It may return one of three things: - -- A return value of zero indicates that there is nothing special - about this dentry and normal checks for mounts and automounts - should proceed. - - autofs normally returns zero, but first waits for any - expiry (automatic unmounting of the mounted filesystem) to - complete. This avoids races. - -- A return value of `-EISDIR` tells the VFS to ignore any mounts - on the directory and to not consider calling `->d_automount()`. - This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag - causing the directory not be a mount trap after all. - - autofs returns this if it detects that the process performing the - lookup is the automount daemon and that the mount has been - requested but has not yet completed. How it determines this is - discussed later. This allows the automount daemon not to get - caught in the mount trap. - - There is a subtlety here. It is possible that a second autofs - filesystem can be mounted below the first and for both of them to - be managed by the same daemon. For the daemon to be able to mount - something on the second it must be able to "walk" down past the - first. This means that d_manage cannot *always* return -EISDIR for - the automount daemon. It must only return it when a mount has - been requested, but has not yet completed. - - `d_manage` also returns `-EISDIR` if the dentry shouldn't be a - mount trap, either because it is a symbolic link or because it is - not empty. - -- Any other negative value is treated as an error and returned - to the caller. - - autofs can return - - - -ENOENT if the automount daemon failed to mount anything, - - -ENOMEM if it ran out of memory, - - -EINTR if a signal arrived while waiting for expiry to - complete - - or any other error sent down by the automount daemon. - - -The second use case only occurs during an "RCU-walk" and so `rcu_walk` -will be set. - -An RCU-walk is a fast and lightweight process for walking down a -filename path (i.e. it is like running on tip-toes). RCU-walk cannot -cope with all situations so when it finds a difficulty it falls back -to "REF-walk", which is slower but more robust. - -RCU-walk will never call `->d_automount`; the filesystems must already -be mounted or RCU-walk cannot handle the path. -To determine if a mount-trap is safe for RCU-walk mode it calls -`->d_manage()` with `rcu_walk` set to `true`. - -In this case `d_manage()` must avoid blocking and should avoid taking -spinlocks if at all possible. Its sole purpose is to determine if it -would be safe to follow down into any mounted directory and the only -reason that it might not be is if an expiry of the mount is -underway. - -In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the -VFS that this is a directory that doesn't require d_automount. If -`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing -mounted, it *will* fall back to REF-walk. `d_manage()` cannot make the -VFS remain in RCU-walk mode, but can only tell it to get out of -RCU-walk mode by returning `-ECHILD`. - -So `d_manage()`, when called with `rcu_walk` set, should either return --ECHILD if there is any reason to believe it is unsafe to end the -mounted filesystem, and otherwise should return 0. - -autofs will return `-ECHILD` if an expiry of the filesystem has been -initiated or is being considered, otherwise it returns 0. - - -Mountpoint expiry ------------------ - -The VFS has a mechanism for automatically expiring unused mounts, -much as it can expire any unused dentry information from the dcache. -This is guided by the MNT_SHRINKABLE flag. This only applies to -mounts that were created by `d_automount()` returning a filesystem to be -mounted. As autofs doesn't return such a filesystem but leaves the -mounting to the automount daemon, it must involve the automount daemon -in unmounting as well. This also means that autofs has more control -of expiry. - -The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to -the `umount` system call. Unmounting with MNT_EXPIRE will fail unless -a previous attempt had been made, and the filesystem has been inactive -and untouched since that previous attempt. autofs4 does not depend on -this but has its own internal tracking of whether filesystems were -recently used. This allows individual names in the autofs directory -to expire separately. - -With version 4 of the protocol, the automount daemon can try to -unmount any filesystems mounted on the autofs filesystem or remove any -symbolic links or empty directories any time it likes. If the unmount -or removal is successful the filesystem will be returned to the state -it was before the mount or creation, so that any access of the name -will trigger normal auto-mount processing. In particlar, `rmdir` and -`unlink` do not leave negative entries in the dcache as a normal -filesystem would, so an attempt to access a recently-removed object is -passed to autofs for handling. - -With version 5, this is not safe except for unmounting from top-level -directories. As lower-level directories are never mount traps, other -processes will see an empty directory as soon as the filesystem is -unmounted. So it is generally safest to use the autofs expiry -protocol described below. - -Normally the daemon only wants to remove entries which haven't been -used for a while. For this purpose autofs maintains a "`last_used`" -time stamp on each directory or symlink. For symlinks it genuinely -does record the last time the symlink was "used" or followed to find -out where it points to. For directories the field is a slight -misnomer. It actually records the last time that autofs checked if -the directory or one of its descendents was busy and found that it -was. This is just as useful and doesn't require updating the field so -often. - -The daemon is able to ask autofs if anything is due to be expired, -using an `ioctl` as discussed later. For a *direct* mount, autofs -considers if the entire mount-tree can be unmounted or not. For an -*indirect* mount, autofs considers each of the names in the top level -directory to determine if any of those can be unmounted and cleaned -up. - -There is an option with indirect mounts to consider each of the leaves -that has been mounted on instead of considering the top-level names. -This is intended for compatability with version 4 of autofs and should -be considered as deprecated. - -When autofs considers a directory it checks the `last_used` time and -compares it with the "timeout" value set when the filesystem was -mounted, though this check is ignored in some cases. It also checks if -the directory or anything below it is in use. For symbolic links, -only the `last_used` time is ever considered. - -If both appear to support expiring the directory or symlink, an action -is taken. - -There are two ways to ask autofs to consider expiry. The first is to -use the **AUTOFS_IOC_EXPIRE** ioctl. This only works for indirect -mounts. If it finds something in the root directory to expire it will -return the name of that thing. Once a name has been returned the -automount daemon needs to unmount any filesystems mounted below the -name normally. As described above, this is unsafe for non-toplevel -mounts in a version-5 autofs. For this reason the current `automountd` -does not use this ioctl. - -The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or -the **AUTOFS_IOC_EXPIRE_MULTI** ioctl. This will work for both direct and -indirect mounts. If it selects an object to expire, it will notify -the daemon using the notification mechanism described below. This -will block until the daemon acknowledges the expiry notification. -This implies that the "`EXPIRE`" ioctl must be sent from a different -thread than the one which handles notification. - -While the ioctl is blocking, the entry is marked as "expiring" and -`d_manage` will block until the daemon affirms that the unmount has -completed (together with removing any directories that might have been -necessary), or has been aborted. - -Communicating with autofs: detecting the daemon ------------------------------------------------ - -There are several forms of communication between the automount daemon -and the filesystem. As we have already seen, the daemon can create and -remove directories and symlinks using normal filesystem operations. -autofs knows whether a process requesting some operation is the daemon -or not based on its process-group id number (see getpgid(1)). - -When an autofs filesystem is mounted the pgid of the mounting -processes is recorded unless the "pgrp=" option is given, in which -case that number is recorded instead. Any request arriving from a -process in that process group is considered to come from the daemon. -If the daemon ever has to be stopped and restarted a new pgid can be -provided through an ioctl as will be described below. - -Communicating with autofs: the event pipe ------------------------------------------ - -When an autofs filesystem is mounted, the 'write' end of a pipe must -be passed using the 'fd=' mount option. autofs will write -notification messages to this pipe for the daemon to respond to. -For version 5, the format of the message is: - - struct autofs_v5_packet { - int proto_version; /* Protocol version */ - int type; /* Type of packet */ - autofs_wqt_t wait_queue_token; - __u32 dev; - __u64 ino; - __u32 uid; - __u32 gid; - __u32 pid; - __u32 tgid; - __u32 len; - char name[NAME_MAX+1]; - }; - -where the type is one of - - autofs_ptype_missing_indirect - autofs_ptype_expire_indirect - autofs_ptype_missing_direct - autofs_ptype_expire_direct - -so messages can indicate that a name is missing (something tried to -access it but it isn't there) or that it has been selected for expiry. - -The pipe will be set to "packet mode" (equivalent to passing -`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at -most one packet, and any unread portion of a packet will be discarded. - -The `wait_queue_token` is a unique number which can identify a -particular request to be acknowledged. When a message is sent over -the pipe the affected dentry is marked as either "active" or -"expiring" and other accesses to it block until the message is -acknowledged using one of the ioctls below and the relevant -`wait_queue_token`. - -Communicating with autofs: root directory ioctls ------------------------------------------------- - -The root directory of an autofs filesystem will respond to a number of -ioctls. The process issuing the ioctl must have the CAP_SYS_ADMIN -capability, or must be the automount daemon. - -The available ioctl commands are: - -- **AUTOFS_IOC_READY**: a notification has been handled. The argument - to the ioctl command is the "wait_queue_token" number - corresponding to the notification being acknowledged. -- **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with - the error code `ENOENT`. -- **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic" - mode meaning that it stops sending notifications to the daemon. - This mode is also entered if a write to the pipe fails. -- **AUTOFS_IOC_PROTOVER**: This returns the protocol version in use. -- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which - is really a version number for the implementation. It is - currently 2. -- **AUTOFS_IOC_SETTIMEOUT**: This passes a pointer to an unsigned - long. The value is used to set the timeout for expiry, and - the current timeout value is stored back through the pointer. -- **AUTOFS_IOC_ASKUMOUNT**: Returns, in the pointed-to `int`, 1 if - the filesystem could be unmounted. This is only a hint as - the situation could change at any instant. This call can be - use to avoid a more expensive full unmount attempt. -- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is - anything suitable to expire. A pointer to a packet: - - struct autofs_packet_expire_multi { - int proto_version; /* Protocol version */ - int type; /* Type of packet */ - autofs_wqt_t wait_queue_token; - int len; - char name[NAME_MAX+1]; - }; - - is required. This is filled in with the name of something - that can be unmounted or removed. If nothing can be expired, - `errno` is set to `EAGAIN`. Even though a `wait_queue_token` - is present in the structure, no "wait queue" is established - and no acknowledgment is needed. -- **AUTOFS_IOC_EXPIRE_MULTI**: This is similar to - **AUTOFS_IOC_EXPIRE** except that it causes notification to be - sent to the daemon, and it blocks until the daemon acknowledges. - The argument is an integer which can contain two different flags. - - **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored - and objects are expired if the are not in use. - - **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level - name to expire. This is only safe when *maxproto* is 4. - -Communicating with autofs: char-device ioctls ---------------------------------------------- - -It is not always possible to open the root of an autofs filesystem, -particularly a *direct* mounted filesystem. If the automount daemon -is restarted there is no way for it to regain control of existing -mounts using any of the above communication channels. To address this -need there is a "miscellaneous" character device (major 10, minor 235) -which can be used to communicate directly with the autofs filesystem. -It requires CAP_SYS_ADMIN for access. - -The `ioctl`s that can be used on this device are described in a separate -document `autofs4-mount-control.txt`, and are summarized briefly here. -Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure: - - struct autofs_dev_ioctl { - __u32 ver_major; - __u32 ver_minor; - __u32 size; /* total size of data passed in - * including this struct */ - __s32 ioctlfd; /* automount command fd */ - - /* Command parameters */ - union { - struct args_protover protover; - struct args_protosubver protosubver; - struct args_openmount openmount; - struct args_ready ready; - struct args_fail fail; - struct args_setpipefd setpipefd; - struct args_timeout timeout; - struct args_requester requester; - struct args_expire expire; - struct args_askumount askumount; - struct args_ismountpoint ismountpoint; - }; - - char path[0]; - }; - -For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target -filesystem is identified by the `path`. All other commands identify -the filesystem by the `ioctlfd` which is a file descriptor open on the -root, and which can be returned by **OPEN_MOUNT**. - -The `ver_major` and `ver_minor` are in/out parameters which check that -the requested version is supported, and report the maximum version -that the kernel module can support. - -Commands are: - -- **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and - set version numbers. -- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor - on the root of an autofs filesystem. The filesystem is identified - by name and device number, which is stored in `openmount.devid`. - Device numbers for existing filesystems can be found in - `/proc/self/mountinfo`. -- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`. -- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the filesystem is in - catatonic mode, this can provide the write end of a new pipe - in `setpipefd.pipefd` to re-establish communication with a daemon. - The process group of the calling process is used to identify the - daemon. -- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a - name within the filesystem that has been auto-mounted on. - On successful return, `requester.uid` and `requester.gid` will be - the UID and GID of the process which triggered that mount. -- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a - mountpoint of a particular type - see separate documentation for - details. -- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**: -- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**: -- **AUTOFS_DEV_IOCTL_READY_CMD**: -- **AUTOFS_DEV_IOCTL_FAIL_CMD**: -- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**: -- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**: -- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**: -- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**: These all have the same - function as the similarly named **AUTOFS_IOC** ioctls, except - that **FAIL** can be given an explicit error number in `fail.status` - instead of assuming `ENOENT`, and this **EXPIRE** command - corresponds to **AUTOFS_IOC_EXPIRE_MULTI**. - -Catatonic mode --------------- - -As mentioned, an autofs mount can enter "catatonic" mode. This -happens if a write to the notification pipe fails, or if it is -explicitly requested by an `ioctl`. - -When entering catatonic mode, the pipe is closed and any pending -notifications are acknowledged with the error `ENOENT`. - -Once in catatonic mode attempts to access non-existing names will -result in `ENOENT` while attempts to access existing directories will -be treated in the same way as if they came from the daemon, so mount -traps will not fire. - -When the filesystem is mounted a _uid_ and _gid_ can be given which -set the ownership of directories and symbolic links. When the -filesystem is in catatonic mode, any process with a matching UID can -create directories or symlinks in the root directory, but not in other -directories. - -Catatonic mode can only be left via the -**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`. - -autofs, name spaces, and shared mounts --------------------------------------- - -With bind mounts and name spaces it is possible for an autofs -filesystem to appear at multiple places in one or more filesystem -name spaces. For this to work sensibly, the autofs filesystem should -always be mounted "shared". e.g. - -> `mount --make-shared /autofs/mount/point` - -The automount daemon is only able to manage a single mount location for -an autofs filesystem and if mounts on that are not 'shared', other -locations will not behave as expected. In particular access to those -other locations will likely result in the `ELOOP` error - -> Too many levels of symbolic links diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt index 7eb762eb31361739bac381a7453ecb449facf161..b0afd3d55eaf15c8d1c07b838c2f88d47d5a56c4 100644 --- a/Documentation/filesystems/automount-support.txt +++ b/Documentation/filesystems/automount-support.txt @@ -9,7 +9,7 @@ also be requested by userspace. IN-KERNEL AUTOMOUNTING ====================== -See section "Mount Traps" of Documentation/filesystems/autofs4.txt +See section "Mount Traps" of Documentation/filesystems/autofs.txt Then from userspace, you can just do something like: diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt index d7f011ddc1500cdf8e705430c90ed60deb7a2ecc..8bf62240e10d35a7dc99fac14f86afeb14313490 100644 --- a/Documentation/filesystems/ceph.txt +++ b/Documentation/filesystems/ceph.txt @@ -105,15 +105,13 @@ Mount Options address its connection to the monitor originates from. wsize=X - Specify the maximum write size in bytes. By default there is no - maximum. Ceph will normally size writes based on the file stripe - size. + Specify the maximum write size in bytes. Default: 16 MB. rsize=X - Specify the maximum read size in bytes. Default: 64 MB. + Specify the maximum read size in bytes. Default: 16 MB. rasize=X - Specify the maximum readahead. Default: 8 MB. + Specify the maximum readahead size in bytes. Default: 8 MB. mount_timeout=X Specify the timeout value for mount (in seconds), in the case diff --git a/Documentation/filesystems/cifs/AUTHORS b/Documentation/filesystems/cifs/AUTHORS index 9f4f87e1624036349533adf9534bfd3c4b08535d..75865da2ce1475c27bea1050b3e80ff7160f6d6f 100644 --- a/Documentation/filesystems/cifs/AUTHORS +++ b/Documentation/filesystems/cifs/AUTHORS @@ -42,9 +42,11 @@ Jeff Layton (many, many fixes, as well as great work on the cifs Kerberos code) Scott Lovenberg Pavel Shilovsky (for great work adding SMB2 support, and various SMB3 features) Aurelien Aptel (for DFS SMB3 work and some key bug fixes) -Ronnie Sahlberg (for SMB3 xattr work and bug fixes) +Ronnie Sahlberg (for SMB3 xattr work, bug fixes, and lots of great work on compounding) Shirish Pargaonkar (for many ACL patches over the years) Sachin Prabhu (many bug fixes, including for reconnect, copy offload and security) +Paulo Alcantara +Long Li (some great work on RDMA, SMB Direct) Test case and Bug Report contributors @@ -58,5 +60,4 @@ mention to the Stanford Checker (SWAT) which pointed out many minor bugs in error paths. Valuable suggestions also have come from Al Viro and Dave Miller. -And thanks to the IBM LTC and Power test teams and SuSE testers for -finding multiple bugs during excellent stress test runs. +And thanks to the IBM LTC and Power test teams and SuSE and Citrix and RedHat testers for finding multiple bugs during excellent stress test runs. diff --git a/Documentation/filesystems/cifs/CHANGES b/Documentation/filesystems/cifs/CHANGES index bc0025cdd1c9c0d285c32e8d7656103868126ca8..455e1cc494a9f2e78ee1d45b89bfbe5ee55048eb 100644 --- a/Documentation/filesystems/cifs/CHANGES +++ b/Documentation/filesystems/cifs/CHANGES @@ -1,3 +1,6 @@ +See https://wiki.samba.org/index.php/LinuxCIFSKernel for +more current information. + Version 1.62 ------------ Add sockopt=TCP_NODELAY mount option. EA (xattr) routines hardened diff --git a/Documentation/filesystems/cifs/TODO b/Documentation/filesystems/cifs/TODO index c5adf149b57f7f8f6e2d0b104d5b74f6bc7f5f84..852499aed64b52bb321c0b9656b0b606a4710772 100644 --- a/Documentation/filesystems/cifs/TODO +++ b/Documentation/filesystems/cifs/TODO @@ -9,14 +9,14 @@ is a partial list of the known problems and missing features: a) SMB3 (and SMB3.02) missing optional features: - multichannel (started), integration with RDMA - - directory leases (improved metadata caching) - - T10 copy offload (copy chunk, and "Duplicate Extents" ioctl + - directory leases (improved metadata caching), started (root dir only) + - T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl currently the only two server side copy mechanisms supported) b) improved sparse file support c) Directory entry caching relies on a 1 second timer, rather than -using Directory Leases +using Directory Leases, currently only the root file handle is cached longer d) quota support (needs minor kernel change since quota calls to make it to network filesystems or deviceless filesystems) @@ -42,6 +42,8 @@ mount or a per server basis to client UIDs or nobody if no mapping exists. Also better integration with winbind for resolving SID owners k) Add tools to take advantage of more smb3 specific ioctls and features +(passthrough ioctl/fsctl for sending various SMB3 fsctls to the server +is in progress) l) encrypted file support @@ -71,9 +73,8 @@ t) split cifs and smb3 support into separate modules so legacy (and less secure) CIFS dialect can be disabled in environments that don't need it and simplify the code. -u) Finish up SMB3.1.1 dialect support - -v) POSIX Extensions for SMB3.1.1 +v) POSIX Extensions for SMB3.1.1 (started, create and mkdir support added +so far). KNOWN BUGS ==================================== @@ -92,8 +93,8 @@ Misc testing to do 1) check out max path names and max path name components against various server types. Try nested symlinks (8 deep). Return max path name in stat -f information -2) Improve xfstest's cifs enablement and adapt xfstests where needed to test -cifs better +2) Improve xfstest's cifs/smb3 enablement and adapt xfstests where needed to test +cifs/smb3 better 3) Additional performance testing and optimization using iozone and similar - there are some easy changes that can be done to parallelize sequential writes, diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index 12a147c9f87f208bc018419d50b4306ba907a017..69f8de99573974749263a33d75904ad5d37971de 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -182,13 +182,15 @@ whint_mode=%s Control which write hints are passed down to block passes down hints with its policy. alloc_mode=%s Adjust block allocation policy, which supports "reuse" and "default". -fsync_mode=%s Control the policy of fsync. Currently supports "posix" - and "strict". In "posix" mode, which is default, fsync - will follow POSIX semantics and does a light operation - to improve the filesystem performance. In "strict" mode, - fsync will be heavy and behaves in line with xfs, ext4 - and btrfs, where xfstest generic/342 will pass, but the - performance will regress. +fsync_mode=%s Control the policy of fsync. Currently supports "posix", + "strict", and "nobarrier". In "posix" mode, which is + default, fsync will follow POSIX semantics and does a + light operation to improve the filesystem performance. + In "strict" mode, fsync will be heavy and behaves in line + with xfs, ext4 and btrfs, where xfstest generic/342 will + pass, but the performance will regress. "nobarrier" is + based on "posix", but doesn't issue flush command for + non-atomic files likewise "nobarrier" mount option. test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt context. The fake fscrypt context is used by xfstests. diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index cfbc18f0d9c985a16f6ff9dcd665f6e8d017a067..48b424de85bbca87f840774883132671aec50f02 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -191,11 +191,21 @@ Currently, the following pairs of encryption modes are supported: - AES-256-XTS for contents and AES-256-CTS-CBC for filenames - AES-128-CBC for contents and AES-128-CTS-CBC for filenames +- Speck128/256-XTS for contents and Speck128/256-CTS-CBC for filenames It is strongly recommended to use AES-256-XTS for contents encryption. AES-128-CBC was added only for low-powered embedded devices with crypto accelerators such as CAAM or CESA that do not support XTS. +Similarly, Speck128/256 support was only added for older or low-end +CPUs which cannot do AES fast enough -- especially ARM CPUs which have +NEON instructions but not the Cryptography Extensions -- and for which +it would not otherwise be feasible to use encryption at all. It is +not recommended to use Speck on CPUs that have AES instructions. +Speck support is only available if it has been enabled in the crypto +API via CONFIG_CRYPTO_SPECK. Also, on ARM platforms, to get +acceptable performance CONFIG_CRYPTO_SPECK_NEON must be enabled. + New encryption modes can be added relatively easily, without changes to individual filesystems. However, authenticated encryption (AE) modes are not currently supported because of the difficulty of dealing diff --git a/Documentation/filesystems/fuse-io.txt b/Documentation/filesystems/fuse-io.txt new file mode 100644 index 0000000000000000000000000000000000000000..07b8f73f100f689263d148563528b82c2a6948c6 --- /dev/null +++ b/Documentation/filesystems/fuse-io.txt @@ -0,0 +1,38 @@ +Fuse supports the following I/O modes: + +- direct-io +- cached + + write-through + + writeback-cache + +The direct-io mode can be selected with the FOPEN_DIRECT_IO flag in the +FUSE_OPEN reply. + +In direct-io mode the page cache is completely bypassed for reads and writes. +No read-ahead takes place. Shared mmap is disabled. + +In cached mode reads may be satisfied from the page cache, and data may be +read-ahead by the kernel to fill the cache. The cache is always kept consistent +after any writes to the file. All mmap modes are supported. + +The cached mode has two sub modes controlling how writes are handled. The +write-through mode is the default and is supported on all kernels. The +writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in the +FUSE_INIT reply. + +In write-through mode each write is immediately sent to userspace as one or more +WRITE requests, as well as updating any cached pages (and caching previously +uncached, but fully written pages). No READ requests are ever sent for writes, +so when an uncached page is partially written, the page is discarded. + +In writeback-cache mode (enabled by the FUSE_WRITEBACK_CACHE flag) writes go to +the cache only, which means that the write(2) syscall can often complete very +fast. Dirty pages are written back implicitly (background writeback or page +reclaim on memory pressure) or explicitly (invoked by close(2), fsync(2) and +when the last ref to the file is being released on munmap(2)). This mode +assumes that all changes to the filesystem go through the FUSE kernel module +(size and atime/ctime/mtime attributes are kept up-to-date by the kernel), so +it's generally not suitable for network filesystems. If a partial page is +written, then the page needs to be first read from userspace. This means, that +even for files opened for O_WRONLY it is possible that READ requests will be +generated by the kernel. diff --git a/Documentation/filesystems/ncpfs.txt b/Documentation/filesystems/ncpfs.txt deleted file mode 100644 index 5af164f4b37b530fba1ffbb238d0dcd1331c2092..0000000000000000000000000000000000000000 --- a/Documentation/filesystems/ncpfs.txt +++ /dev/null @@ -1,12 +0,0 @@ -The ncpfs filesystem understands the NCP protocol, designed by the -Novell Corporation for their NetWare(tm) product. NCP is functionally -similar to the NFS used in the TCP/IP community. -To mount a NetWare filesystem, you need a special mount program, which -can be found in the ncpfs package. The home site for ncpfs is -ftp.gwdg.de/pub/linux/misc/ncpfs, but sunsite and its many mirrors -will have it as well. - -Related products are linware and mars_nwe, which will give Linux partial -NetWare server functionality. - -mars_nwe can be found on ftp.gwdg.de/pub/linux/misc/ncpfs. diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt index 5efae00f6c7fd82f863939ac2e6f2883a52fee09..d2963123eb1ccca8a299486f1eb998f05acf5041 100644 --- a/Documentation/filesystems/nfs/nfsroot.txt +++ b/Documentation/filesystems/nfs/nfsroot.txt @@ -5,6 +5,7 @@ Written 1996 by Gero Kuhlmann Updated 1997 by Martin Mares Updated 2006 by Nico Schottelius Updated 2006 by Horms +Updated 2018 by Chris Novakovic @@ -79,7 +80,7 @@ nfsroot=[:][,] ip=::::::: - : + :: This parameter tells the kernel how to configure IP addresses of devices and also how to set up the IP routing table. It was originally called @@ -110,6 +111,9 @@ ip=::::::: will not be triggered if it is missing and NFS root is not in operation. + Value is exported to /proc/net/pnp with the prefix "bootserver " + (see below). + Default: Determined using autoconfiguration. The address of the autoconfiguration server is used. @@ -123,10 +127,13 @@ ip=::::::: Default: Determined using autoconfiguration. - Name of the client. May be supplied by autoconfiguration, - but its absence will not trigger autoconfiguration. - If specified and DHCP is used, the user provided hostname will - be carried in the DHCP request to hopefully update DNS record. + Name of the client. If a '.' character is present, anything + before the first '.' is used as the client's hostname, and anything + after it is used as its NIS domain name. May be supplied by + autoconfiguration, but its absence will not trigger autoconfiguration. + If specified and DHCP is used, the user-provided hostname (and NIS + domain name, if present) will be carried in the DHCP request; this + may cause a DNS record to be created or updated for the client. Default: Client IP address is used in ASCII notation. @@ -162,12 +169,55 @@ ip=::::::: Default: any - IP address of first nameserver. - Value gets exported by /proc/net/pnp which is often linked - on embedded systems by /etc/resolv.conf. + IP address of primary nameserver. + Value is exported to /proc/net/pnp with the prefix "nameserver " + (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + IP address of secondary nameserver. + See . + + IP address of a Network Time Protocol (NTP) server. + Value is exported to /proc/net/ipconfig/ntp_servers, but is + otherwise unused (see below). + + Default: None if not using autoconfiguration; determined + automatically if using autoconfiguration. + + After configuration (whether manual or automatic) is complete, two files + are created in the following format; lines are omitted if their respective + value is empty following configuration: + + - /proc/net/pnp: + + #PROTO: (depending on configuration method) + domain (if autoconfigured, the DNS domain) + nameserver (primary name server IP) + nameserver (secondary name server IP) + nameserver (tertiary name server IP) + bootserver (NFS server IP) + + - /proc/net/ipconfig/ntp_servers: + + (NTP server IP) + (NTP server IP) + (NTP server IP) + + and (in /proc/net/pnp) and and + (in /proc/net/ipconfig/ntp_servers) are requested during autoconfiguration; + they cannot be specified as part of the "ip=" kernel command line parameter. + + Because the "domain" and "nameserver" options are recognised by DNS + resolvers, /etc/resolv.conf is often linked to /proc/net/pnp on systems + that use an NFS root filesystem. - IP address of second nameserver. - Same as above. + Note that the kernel will not synchronise the system time with any NTP + servers it discovers; this is the responsibility of a user space process + (e.g. an initrd/initramfs script that passes the IP addresses listed in + /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real + root filesystem if it is on NFS). nfsrootdebug diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index 961b287ef3233ef5bac93ab76f7ed30377d80e34..72615a2c07529ab487e3c9aa4604ed9f0755e66d 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt @@ -429,11 +429,12 @@ This verification may cause significant overhead in some cases. Testsuite --------- -There's testsuite developed by David Howells at: +There's a testsuite originally developed by David Howells and currently +maintained by Amir Goldstein at: - git://git.infradead.org/users/dhowells/unionmount-testsuite.git + https://github.com/amir73il/unionmount-testsuite.git Run as root: # cd unionmount-testsuite - # ./run --ov + # ./run --ov --verify diff --git a/Documentation/filesystems/path-lookup.md b/Documentation/filesystems/path-lookup.md index 1933ef734e63b03802b42ce236b2b664b239dd93..e2edd45c4bc0882260b2506bdd9b1929d746e6b7 100644 --- a/Documentation/filesystems/path-lookup.md +++ b/Documentation/filesystems/path-lookup.md @@ -460,7 +460,7 @@ this retry process in the next article. Automount points are locations in the filesystem where an attempt to lookup a name can trigger changes to how that lookup should be handled, in particular by mounting a filesystem there. These are -covered in greater detail in autofs4.txt in the Linux documentation +covered in greater detail in autofs.txt in the Linux documentation tree, but a few notes specifically related to path lookup are in order here. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 2a84bb3348947c1c0302ecfaa3a01a21fa558fef..520f6a84cf50ffefacea03a47685e5dcffa4ba76 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -515,7 +515,8 @@ guarantees: The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG bits on both physical and virtual pages associated with a process, and the -soft-dirty bit on pte (see Documentation/vm/soft-dirty.txt for details). +soft-dirty bit on pte (see Documentation/admin-guide/mm/soft-dirty.rst +for details). To clear the bits for all the pages associated with the process > echo 1 > /proc/PID/clear_refs @@ -536,7 +537,8 @@ Any other value written to /proc/PID/clear_refs will have no effect. The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags using /proc/kpageflags and number of times a page is mapped using -/proc/kpagecount. For detailed explanation, see Documentation/vm/pagemap.txt. +/proc/kpagecount. For detailed explanation, see +Documentation/admin-guide/mm/pagemap.rst. The /proc/pid/numa_maps is an extension based on maps, showing the memory locality and binding policy, as well as the memory usage (in pages) of @@ -564,7 +566,7 @@ address policy mapping details Where: "address" is the starting address for the mapping; -"policy" reports the NUMA memory policy set for the mapping (see vm/numa_memory_policy.txt); +"policy" reports the NUMA memory policy set for the mapping (see Documentation/admin-guide/mm/numa_memory_policy.rst); "mapping details" summarizes mapping data such as mapping type, page usage counters, node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page size, in KB, that is backing the mapping up. diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index a85355cf85f449da93c7bd9fc98690344aeba134..d06e9a59a9f4a82964f39c3d93c0c5d557f81326 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -105,8 +105,9 @@ policy for the file will revert to "default" policy. NUMA memory allocation policies have optional flags that can be used in conjunction with their modes. These optional flags can be specified when tmpfs is mounted by appending them to the mode before the NodeList. -See Documentation/vm/numa_memory_policy.txt for a list of all available -memory allocation policy mode flags and their effect on memory policy. +See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of +all available memory allocation policy mode flags and their effect on +memory policy. =static is equivalent to MPOL_F_STATIC_NODES =relative is equivalent to MPOL_F_RELATIVE_NODES diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5fd325df59e2233df60cbf6da6c92230aa0c26ef..f608180ad59d71ab2bcc2d2d818699bfaaee1470 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -856,7 +856,7 @@ struct file_operations { ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); int (*iterate) (struct file *, struct dir_context *); - unsigned int (*poll) (struct file *, struct poll_table_struct *); + __poll_t (*poll) (struct file *, struct poll_table_struct *); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*c