Python/文字列

Python
文字列

イントロダクション

文字列とは何か？

文字列（string）は、文字のシーケンスを表すデータ型です。文字列はプログラミングにおいて非常に重要であり、テキストの処理、データの保存と表示、ユーザー入力の取得など、様々な場面で使用されます。Pythonでは、文字列はシングルクォート（'）、ダブルクォート（"）、またはトリプルクォート（'''または"""）で囲んで表現します。

Pythonにおける文字列の基本

Pythonで文字列を作成する方法は以下の通りです。

# シングルクォートで囲む
string1 = 'こんにちは'

# ダブルクォートで囲む
string2 = "Pythonの世界へようこそ"

# トリプルクォートで囲む（複数行の文字列）
string3 = '''これは
複数行の
文字列です'''

# トリプルクォートで囲む（ダブルクォートバージョン）
string4 = """こちらも
複数行の
文字列です"""

これらの方法で作成された文字列はすべてstr型のオブジェクトです。

文字列の基本操作

文字列の長さを取得する

文字列の長さを取得するには、len()関数を使用します。

message = "Hello, World!"
print(len(message))  # 出力: 13

文字列のインデックスとスライシング

文字列のインデックスは、文字列の各文字に位置を割り当てます。Pythonのインデックスは0から始まります。

message = "Python"

# インデックスを使って文字を取得
print(message[0])  # 出力: P
print(message[1])  # 出力: y
print(message[-1])  # 出力: n（ネガティブインデックスは末尾から数える）

# スライシングを使って部分文字列を取得
print(message[0:2])  # 出力: Py
print(message[2:])   # 出力: thon
print(message[:4])   # 出力: Pyth

文字列の結合と繰り返し

文字列の結合は+演算子を使用し、繰り返しは*演算子を使用します。

# 文字列の結合
greeting = "Hello" + " " + "World"
print(greeting)  # 出力: Hello World

# 文字列の繰り返し
echo = "Echo! " * 3
print(echo)  # 出力: Echo! Echo! Echo!

文字列のエスケープシーケンス

エスケープシーケンスを使用すると、特殊な文字を文字列内に含めることができます。

# 改行
newline = "Hello\nWorld"
print(newline)

# タブ
tabbed = "Hello\tWorld"
print(tabbed)

# シングルクォート
single_quote = 'It\'s a wonderful day'
print(single_quote)

# ダブルクォート
double_quote = "She said, \"Hello!\""
print(double_quote)

# バックスラッシュ
backslash = "This is a backslash: \\"
print(backslash)

文字列のメソッド

文字列の検索

文字列内で特定の文字や部分文字列を検索するには、find()やindex()メソッドを使用します。

message = "Hello, World!"

# find()メソッド
print(message.find("World"))  # 出力: 7（見つかった位置を返す）

# index()メソッド
print(message.index("World"))  # 出力: 7（見つかった位置を返す）

find()メソッドは見つからない場合に-1を返しますが、index()メソッドは例外をスローします。

# startswith()メソッド
print(message.startswith("Hello"))  # 出力: True

# endswith()メソッド
print(message.endswith("!"))  # 出力: True

文字列の変更

文字列は不変（immutable）であるため、文字列を変更する操作は常に新しい文字列を返します。

message = "Hello, World!"

# replace()メソッド
new_message = message.replace("World", "Python")
print(new_message)  # 出力: Hello, Python!

# strip()、lstrip()、rstrip()メソッド
whitespace = "   Hello, World!   "
print(whitespace.strip())  # 出力: Hello, World!
print(whitespace.lstrip())  # 出力: Hello, World!   
print(whitespace.rstrip())  # 出力:    Hello, World!

文字列の変換

文字列の大文字小文字を変換するメソッドには、upper()、lower()、title()があります。

message = "hello, world!"

# upper()メソッド
print(message.upper())  # 出力: HELLO, WORLD!

# lower()メソッド
print(message.lower())  # 出力: hello, world!

# title()メソッド
print(message.title())  # 出力: Hello, World!

文字列を分割したり結合したりするメソッドには、split()、join()があります。

# split()メソッド
sentence = "This is a test."
words = sentence.split()
print(words)  # 出力: ['This', 'is', 'a', 'test.']

# join()メソッド
joined = " ".join(words)
print(joined)  # 出力: This is a test.

フォーマット

文字列をフォーマットする方法には、古い形式の%演算子、str.format()メソッド、そして最新のf文字列（フォーマット済み文字列リテラル）があります。

name = "Alice"
age = 30

# %演算子
print("My name is %s and I am %d years old." % (name, age))

# str.format()メソッド
print("My name is {} and I am {} years old.".format(name, age))

# f文字列
print(f"My name is {name} and I am {age} years old.")

文字列とエンコーディング

エンコーディングの基本

文字エンコーディングは、文字をバイト列に変換する方法です。PythonはUnicodeをサポートし、標準のエンコーディングとしてUTF-8を使用します。

文字列のエンコードとデコード

文字列をバイト列にエンコードしたり、バイト列を文字列にデコードしたりするには、encode()とdecode()メソッドを使用します。

# 文字列をバイト列にエンコード
text = "こんにちは"
encoded_text = text.encode("utf-8")
print(encoded_text)  # 出力: b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf'

# バイト列を文字列にデコード
decoded_text = encoded_text.decode("utf-8")
print(decoded_text)  # 出力: こんにちは

正規表現による文字列操作

正規表現の基本

正規表現（regex）は、文字列のパターンを表現するための特殊な文字列です。Pythonのreモジュールを使用して正規表現を扱います。

基本的な正規表現パターン

正規表現の基本パターンをいくつか紹介します。

import re

# パターン: 数字
pattern = r"\d+"

# 対象文字列
text = "There are 123 apples and 456 oranges."

# 正規表現の検索
matches = re.findall(pattern, text)
print(matches)  # 出力: ['123', '456']

正規表現の使用例

正規表現を使用して、文字列のマッチング、置換、分割を行います。

# パターン: 単語の置換
pattern = r"apples"
replacement = "bananas"

# 置換
new_text = re.sub(pattern, replacement, text)
print(new_text) 

 # 出力: There are 123 bananas and 456 oranges.

# 分割
pattern = r"\s+"
split_text = re.split(pattern, text)
print(split_text)  # 出力: ['There', 'are', '123', 'apples', 'and', '456', 'oranges.']

文字列の高度な操作

文字列のテンプレート

string.Templateクラスを使用して、テンプレートベースの文字列操作を行います。

from string import Template

# テンプレートの定義
template = Template("Hello, $name! Welcome to $place.")

# 変数を使ったテンプレートの置換
result = template.substitute(name="Alice", place="Wonderland")
print(result)  # 出力: Hello, Alice! Welcome to Wonderland.

マルチライン文字列の操作

トリプルクォートを使用して、複数行の文字列を作成します。

multiline_string = """This is a
multiline string
in Python."""

print(multiline_string)

マルチライン文字列を処理する方法も紹介します。

# 各行をリストに分割
lines = multiline_string.split("\n")
print(lines)  # 出力: ['This is a', 'multiline string', 'in Python.']

# 各行をトリミング
trimmed_lines = [line.strip() for line in lines]
print(trimmed_lines)  # 出力: ['This is a', 'multiline string', 'in Python.']

文字列の翻訳とトランスフォーム

translate()メソッドとstr.maketrans()関数を使用して、文字列の翻訳を行います。

# 翻訳マップの作成
translation_map = str.maketrans("aeiou", "12345")

# 文字列の翻訳
translated_string = "hello world".translate(translation_map)
print(translated_string)  # 出力: h2ll4 w4rld

実践的な文字列操作の例

ファイルからの文字列読み込みと書き込み

ファイル操作の基本を紹介します。

# ファイルへの書き込み
with open("example.txt", "w") as file:
    file.write("Hello, World!\nThis is a test.")

# ファイルからの読み込み
with open("example.txt", "r") as file:
    content = file.read()
    print(content)  # 出力: Hello, World!\nThis is a test.

URLやパスの操作

urllibモジュールを使用してURL操作を行います。

from urllib.parse import urlparse

# URLの解析
url = "https://www.example.com/path/to/page?name=example"
parsed_url = urlparse(url)
print(parsed_url.scheme)  # 出力: https
print(parsed_url.netloc)  # 出力: www.example.com
print(parsed_url.path)    # 出力: /path/to/page
print(parsed_url.query)   # 出力: name=example

os.pathモジュールを使用してファイルパスの操作を行います。

import os

# ファイルパスの結合
file_path = os.path.join("folder", "subfolder", "file.txt")
print(file_path)  # 出力: folder/subfolder/file.txt

# ファイル名の取得
print(os.path.basename(file_path))  # 出力: file.txt

# ディレクトリ名の取得
print(os.path.dirname(file_path))   # 出力: folder/subfolder

文字列解析とデータ抽出

文字列解析の実践例とデータ抽出のテクニックを紹介します。

import re

# 日付パターンの正規表現
date_pattern = r"\d{4}-\d{2}-\d{2}"

# 文字列内の日付を抽出
text = "The events are scheduled for 2024-06-19 and 2024-07-20."
dates = re.findall(date_pattern, text)
print(dates)  # 出力: ['2024-06-19', '2024-07-20']

章末問題と練習

基本問題

文字列の長さを取得する関数を作成してください。
```
print(get_length("Hello, World!"))  # 出力: 13
```
文字列の一部を取り出す関数を作成してください。
```
print(get_substring("Hello, World!", 7, 12))  # 出力: World
```

応用問題

与えられた文字列内の全ての単語を逆順にする関数を作成してください。
```
print(reverse_words("Hello World"))  # 出力: World Hello
```
文字列内の特定のパターンを正規表現で検索し、置換する関数を作成してください。
```
print(replace_pattern("Hello 123 World", r"\d+", "456"))  # 出力: Hello 456 World
```

解答例と解説

以下は練習問題の解答例です。

基本問題

文字列の長さを取得する関数

def get_length(string):
    return len(string)

print(get_length("Hello, World!"))  # 出力: 13

文字列の一部を取り出す関数

def get_substring(string, start, end):
    return string[start:end]

print(get_substring("Hello, World!", 7, 12))  # 出力: World

応用問題

与えられた文字列内の全ての単語を逆順にする関数

def reverse_words(string):
    words = string.split()
    reversed_words = " ".join(words[::-1])
    return reversed_words

print(reverse_words("Hello World"))  # 出力: World Hello

文字列内の特定のパターンを正規表現で検索し、置換する関数

import re

def replace_pattern(string, pattern, replacement):
    return re.sub(pattern, replacement, string)

print(replace_pattern("Hello 123 World", r"\d+", "456"))  # 出力: Hello 456 World

タプル

Python
文字列

辞書