There is a foreign language which uses the latin alphabet, but the order among letters is not "a", "b", "c" ... "z" as in English.
You receive a list of non-empty strings words from the dictionary, where the words are sorted lexicographically based on the rules of this new language.
Derive the order of letters in this language. If the order is invalid, return an empty string. If there are multiple valid order of letters, return any of them.
A string a is lexicographically smaller than a string b if either of the following is true:
a than in b.a is a prefix of b and a.length < b.length.Example 1:
Input: ["z","o"]
Output: "zo"Explanation:
From "z" and "o", we know 'z' < 'o', so return "zo".
Example 2:
Input: ["hrn","hrf","er","enn","rfnn"]
Output: "hernf"Explanation:
Constraints:
words will contain characters only from lowercase 'a' to 'z'.1 <= words.length <= 1001 <= words[i].length <= 100
You should aim for a solution with O(N + V + E) time and O(V + E) space, where N is the sum of the lengths of all the strings, V is the number of unique characters (vertices), and E is the number of edges.
Can you think of this as a graph problem? Characters from a through z are nodes. What could the edges represent here? How can you create edges from the given words? Perhaps you should try comparing two adjacent words.
The relative ordering of the characters can be treated as edges. For example, consider the words ordered as ["ape", "apple"]. "ape" comes before "apple", which indicates that 'e' is a predecessor of 'p'. Therefore, there is a directed edge e -> p, and this dependency should be valid across all the words. In this way, we can build an adjacency list by comparing adjacent words. Can you think of an algorithm that is suitable to find a valid ordering?
We can use Topological Sort to ensure every node appears after its predecessor. Using DFS, we traverse the graph built from the adjacency list. A visited map tracks nodes in the current DFS path: False means not in the path, and True means in the path. If any DFS call returns True, it indicates a cycle and we return immediately. How do we extract the ordering from this DFS?
When we visit a node and its children and don't find a cycle, we mark the node as False in the map and append it to the result, treating this as a post-order traversal. If we find a cycle, we return an empty string; otherwise, we return the result list.
Before attempting this problem, you should be comfortable with:
The words are already sorted in an unknown alphabet order.
So when you compare two adjacent words, the first position where they differ tells you a rule about letter order:
w1[j] != w2[j], then w1[j] must come before w2[j] in the alien alphabet (w1[j] -> w2[j]).All these rules form a directed graph (letters = nodes, “comes before” = directed edge).
Now the problem becomes: find a topological ordering of this graph.
DFS helps in two ways:
Also, special invalid case:
w1 is longer but w2 is a prefix of w1 (like "abc" before "ab"), that's impossible - return "".w1 starts with w2 and len(w1) > len(w2), return "".w1[j] -> w2[j].dfs from every character:dfs finds a cycle, return ""; else return the joined string.class Solution:
def foreignDictionary(self, words: List[str]) -> str:
adj = {c: set() for w in words for c in w}
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
minLen = min(len(w1), len(w2))
if len(w1) > len(w2) and w1[:minLen] == w2[:minLen]:
return ""
for j in range(minLen):
if w1[j] != w2[j]:
adj[w1[j]].add(w2[j])
break
visited = {}
res = []
def dfs(char):
if char in visited:
return visited[char]
visited[char] = True
for neighChar in adj[char]:
if dfs(neighChar):
return True
visited[char] = False
res.append(char)
for char in adj:
if dfs(char):
return ""
res.reverse()
return "".join(res)Where is the number of unique characters, is the number of edges and is the sum of lengths of all the strings.
From the sorted alien words, each pair of adjacent words gives you a letter-order rule at the first mismatching character:
w1[j] != w2[j], then w1[j] -> w2[j] (meaning w1[j] comes before w2[j]).These rules form a directed graph. The alien alphabet is just a topological ordering of this graph.
Kahn's algorithm (BFS topological sort) works by:
indegree).indegree = 0 (no unmet prerequisites) and "removing" them from the graph.If there's a cycle, some letters will never reach indegree 0, so we won't be able to output all letters.
Also invalid input case:
"abc" before "ab"), ordering is impossible.(w1, w2):w1 starts with w2 and len(w1) > len(w2), return "".j where they differ and add edge w1[j] -> w2[j] (only once).indegree[w2[j]] when you add a new edge.indegree = 0 into a queue.indegree.0, push it into the queue."".class Solution:
def foreignDictionary(self, words):
adj = {c: set() for w in words for c in w}
indegree = {c: 0 for c in adj}
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
minLen = min(len(w1), len(w2))
if len(w1) > len(w2) and w1[:minLen] == w2[:minLen]:
return ""
for j in range(minLen):
if w1[j] != w2[j]:
if w2[j] not in adj[w1[j]]:
adj[w1[j]].add(w2[j])
indegree[w2[j]] += 1
break
q = deque([c for c in indegree if indegree[c] == 0])
res = []
while q:
char = q.popleft()
res.append(char)
for neighbor in adj[char]:
indegree[neighbor] -= 1
if indegree[neighbor] == 0:
q.append(neighbor)
if len(res) != len(indegree):
return ""
return "".join(res)Where is the number of unique characters, is the number of edges and is the sum of lengths of all the strings.
When a longer word appears before its own prefix (e.g., "abc" before "ab"), this is an invalid ordering that cannot exist in any alphabet. Failing to detect this case and return an empty string leads to incorrect results or undefined behavior.
The ordering information comes from the first differing character between adjacent words. A common mistake is comparing all differing characters or only comparing the first characters. You must find the first position where characters differ and extract exactly one edge from that comparison.
Characters that appear in the words but have no ordering constraints (no incoming or outgoing edges) must still appear in the final alphabet. Forgetting to initialize nodes for all unique characters causes some letters to be missing from the output.
The three-state visited tracking (unvisited, visiting, visited) is crucial for detecting cycles. Using only two states (visited/unvisited) cannot distinguish between a back edge (cycle) and a cross edge (already processed node). This leads to either false cycle detection or missing actual cycles.
When the same character pair appears from multiple adjacent word comparisons, adding duplicate edges inflates the indegree count in Kahn's algorithm, preventing nodes from ever reaching zero indegree. Use a set to track existing edges before adding new ones.