3. Kanji compounds (熟語)
- **Individual Kanji characters** (字)
## So, let me tell you about [Kanji](https://en.wikipedia.org/wiki/Kanji)...
Ancient [logographic](https://en.wikipedia.org/wiki/Logogram) writing system originating from China.
One Kanji ≈ one word/concept. Multiple Kanji ≈ one word/concept.
停 電 停電 電停
[A thousand Kanji](https://en.wikipedia.org/wiki/Ky%C5%8Diku_kanji) in primary school and [another thousand](https://en.wikipedia.org/wiki/J%C5%8Dy%C5%8D_kanji) in high school.
Learning to **recognize**, read, pronounce and write Kanji is difficult.
# Kanji Primer
Some simple examples:
月 火 水 木 金 土 日
_Combine_ multiple Kanji together to make a single "complex" Kanji:
朋 棚 林 森 明 昌
Knowing the number of strokes in each Kanji is helpful:
ー 九 土 木 本 旭 体 侍 待 倉
Complex Kanji typically have a higher stroke count.
There are thousands of Kanji in use. How can we remember them?!
# Getting a Grip on the Kanji
金 + 失 = 鉄
If you can't _say_ it, you can't _use_ it.
How can we refer to these things?
We **cheat**. We assign _our own English keyword_ to each Kanji.
The keyword _may_ be related to the "meaning" of the Kanji.
We make up _our own story_ to connect the keyword to the Kanji, e.g.:
> Iron is an essential metal for the human body, so don't lose too much of it.
The [Heisig System](https://en.wikipedia.org/wiki/Remembering_the_Kanji_and_Remembering_the_Hanzi): 80% of the time, it works *every* time.
# Divide and Conquer
Complex Kanji often consist of simpler _parts_. Some complex examples:
Examples of parts:
These **parts** get called different things: [部首](https://ja.wikipedia.org/wiki/%E9%83%A8%E9%A6%96), [radicals](https://kanjialive.com/214-traditional-kanji-radicals), roots, primitive components...
Some parts can be Kanji themselves. Others only occur as part of something else.
1. How do we **organize** Kanji in a way that's easier to remember?
2. How do we **identify** Kanji that are similar to each other?
3. How do we **break down** complex Kanji into simpler parts?
4. How do we **look up** Kanji that we've never seen before?
5. How do we avoid doing the hard, manual work by ourselves?
class: inverse, center, middle
# [Time for some Python!](https://github.com/mpenkov/heisig/blob/master/demo.ipynb)
Covers **2,042** commonly used Kanji, dividing them into **56** _Lessons_.
We can now refer to 2,042 Kanji using an unambiguous English _keyword_.
For other Kanji, we have to improvise, but they are relatively rare.
From the [Electronic Dictionary Research & Development Group](http://www.edrdg.org/edrdg/index.html).
Decomposes **6,355** Kanji into **254** unique radicals.
哀 : 衣 口 亠
愛 : 心 爪 冖 夂
旭 : 日 九
梓 : 十 辛 木 立
圧 : 土 厂
Example applications: searching for Kanji by radicals (parts), handwriting recognition, **comparing Kanji**.
Problems: arbitrary order of radicals, no positional information.
# Radically Confusing
Example of _different_ Kanji that have exactly the same radicals:
日sun木tree 椙 杲 杳 橸
含 吟 合 哈
戎 戈 戉 戔
可 司 叮 哥
押 抽 抻
脅 肋 脇
桂 杜 埜
甲 申 由
細 累 縲
Luckily, most of these are pretty rare.
Covers **20,092** Kanji, decomposing each Kanji into smaller parts using [Polish notation](https://en.wikipedia.org/wiki/Polish_notation).
Total of 12 layouts (translingual Unicode characters):
⿰ ⿱ ⿸ ⿺ ⿵ ⿳ ⿴ ⿹ ⿲ ⿷ ⿻ ⿶
# Encountering the Unknown
Kanji search challenge: look up the characters below in a dictionary.
顳 鸚 爨 驪 纜 鱸 鸛
When you stumble upon Kanji you don't recognize, your options include:
1. Copy-paste into a dictionary (trivial)
2. Enter it using the IME (easy)
3. Look it up in a Kanji dictionary using the _main radical_ (tricky)
4. Estimate the stroke count, narrow down the results (tedious)
5. Enter it via handwriting recognition (tedious)
6. Ask a friend (potentially annoying for your Kanji-literate friend)
7. Give up. Pretend that you never saw it and move on with life (unrewarding)
Armed with Python and the knowledge from the previous slides, you have more options:
8. If you recognize _any_ of the radicals, look them up via KRAD
9. If you recognize any of the parts, look them up via CHISE
# Let's talk about [Graphs](https://en.wikipedia.org/wiki/Graph_theory)
Labeled vs unlabeled graphs
Application of graphs to Kanji visualization
# Unlabeled Graph Example
The graph below has 26 *nodes* (vertices, points) and 17 *edges* (arcs, lines).
# Connected Components
There are 9 *connected components* (clusters, groups) in the graph below.
# Labeled Graph Example
Each node corresponds to a Kanji. Edges connect related Kanji.
# [Subgraph](https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms#subgraph) Example
A subgraph containing only 16 nodes. Helps simplify complex graphs.
# Demo Time
Visually identify related Kanji
Study related Kanji together
Reinforce "problematic" Kanji
## Stuff We Talked About
- Brief introduction to Kanji
- Kanji resources: Heisig, KRAD, CHISE
- Basic graph theory: nodes, vertices, connected components, subgraphs
- Putting it all together: visualizing Kanji graphs
## Future Work
- Kanji pronunciation
- Identifying ambiguous keywords
- Application to words (Kanji compounds)
- Introduce directed graphs
Want to collaborate? Talk to me!
## Thank you for listening!